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STATISTICAL THEORY OF PROPHYLACTIC AND THERAPEUTIC 
TRIALS 
Il. METHODS OF OPERATIONAL ADVANTAGE 


LANCELOT HOGBEN and RAYMOND WRIGHTON 


Department of Medical Statistics, University of Birmingham, and Department of Social and Industrial Medicine, 
University of Sheffield 


1. NEED FOR A NEW APPROACH 


Statistical procedures may subserve either of two 
ends. In the conduct of government, commerce, 
and manufacture it may be legitimate to invoke 
them with no aim other than to prescribe a course 
of action which limits certain assignable risks. We 
speak appropriately of any such prescription as 
conditional. In biological research our primary 
concern is to establish propositions worthy to 
take their place in the corpus of scientific knowledge 
accepted as a basis for subsequent action unrestricted 
by immediate administrative preoccupations. We 
speak of any such assertion as unconditional. 
Much needless confusion concerning the credentials 
of statistical techniques arises through failure to 
recognize how far each is meaningful in one or 
other domain. Since our concern in this context is 
with the validification of results obtained in the 
conduct of scientific research, unconditional statisti- 
cal inference alone is relevant to the end in view. 


In our previous communication (Hogben and 
Wrighton, 1952), we have recognized a_ broad 
distinction between statistical procedures of two 
sorts, respectively referred to as fests and estimation. 
Under the first heading we have seen that it is now 
necessary to distinguish two prescriptions: 


(i) the significance test, which operates within the 
framework of a unique null hypothesis; 


(ii) the decision test, which involves the specifica- 
tion of alternative admissible hypotheses. 


It is likewise necessary to distinguish between two 
ways in which contemporary writers use the term 
estimation, viz., point estimation and _ interval 
estimation. In either case, our concern is with a 
parameter (or parameters) of a particular universe 
from which we may draw a sample. Point estimation 
undertakes to specify a unique value of the parameter 
as the best one; but in doing so relinquishes the 
possibility of assigning an acceptable uncertainty 
safeguard to the form the assertion takes. Interval 


estimation repudiates the undertaking to specify 
any single value of it as better than every other. 
Within the framework of an acceptable level of 
uncertainty, i.e., probability of false assertion, it 
subsumes rules of procedure which entitle us to 
make statements delimiting a range of values 
within which the parameter lies. 


In the opening paragraph of this contribution, 
and elsewhere in the previous one, we have drawn 
an admittedly provisional distinction between 
conditional and unconditional assertions in terms 
of the uses to which we put them. This is clear-cut 
in the sense that: 


(a) any statement worthy to rank in the corpus 
of scientific knowledge is one which we can rightly 
describe as unconditional in the sense elsewhere 
defined ; 


(b) statements of the conditional sort may suffice 
as a basis for administrative decision. 


It is none the less possible to formulate rules of 
decision leading to unconditional statements of a 
sort rarely, if ever, relevant to the domain of research 
in pure science and no more useful to the administra- 
tor because more comprehensive in scope than a 
corresponding statement expressed in the more 
restricted form. Such is the class of decision tests 
which emerge in the theory of consumer and 
producer risk. 


Further consideration of the Drosophila model 
of our earlier contribution will make this clear. We 
there set up two hypotheses: H, that p = $ = pa, 
and H, that p = ?} = ps, p being the probability 
that any offspring of a particular mother will be 
female. If we make the rule to reject Hz if x > (a+ 4) 
for the r-fold sample, denoting by L.¢ the probability 
that it will contain x females if p = pe, we may 
assign as the conditional risk («) of rejection when 


Hg is true: 
# r 
04 > Lx.a . 
x (a+ 1 
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Similarly, we may adopt Hy, as our null hypothesis 
and make the rule to reject it if x << (6+ 4). The 
corresponding conditional risk (8) of rejection is 


then: 
x b 
B a | oe e 
x 0 


In either case, we attach an uncertainty safeguard 
(x or 8) to a statement which is conditional in 
the sense that it refers to a risk we take of being 
wrong if a particular hypothesis is correct. Unless 
a= hb, the simultaneous application of the two 
rejection criteria will not necessarily lead to a 
decision in favour of either hypothesis; but we 
can formulate a composite rule which must do so 
in the form: 

reject H, if x > (k + 4) 

reject Hp if x < (k + 4). 
We may then be able to choose k so that x ~ y ~ 8, 
if r is fairly large. This leads to a conditional 
assertion which assigns y as the risk that we shall 
reject either hypothesis if true; but it does not assign 
an acceptable safeguard to any unconditional 
assertion about the outcome unless H, and Hp» are 
the only admissible hypotheses. We can make a 
more comprehensive type of statement if we restate 
our hypothesis in the form H, that p < pa and 
Hp», that p > pp»; and may still guarantee the termina- 
tion of the test in a decision if we follow the same 
composite rule of rejecting H, when x > (k + 4) 
and rejecting H, when x < (k + 4). We may then 
define Ly.g and L,., in terms of pa and py» as before, 
and fix & so that: 


x r 
> Lx.q = 


x k 
ee = 2. Lx.b e 
x = (k+1) x 0 


Any value p< p, then makes the conditional 
risk of rejecting H, in its new form less than «; 
and any value of p > p» makes the conditional risk 
of rejecting H» in its new form less than %, which 
we may assign at any acceptable level, if free to 
prescribe in advance the sample size. The rule 
itself limits our allowable positive statements to the 
alternatives p > pa and p< pp». Except for the 
trivial case pa=O0O or pp=1, it prohibits any 
statement of the form pg < p< pp». Since «% sets 
a limit to the probability of any false assertion we 
may make, we are entitled to say that Prp<— & 
unconditionally defines the uncertainty safeguard 
of the entire class of statements which the rule 
subsumes; but we can state this only because the 
rule subsumes no possibility of simultaneous state- 
ment concerning the relation of p to both pz, and pp. 

If we know that the Drosophila culture contains several 
different genotypes to which we can assign values of p, we 


false and hypothesis unproven. 


can meaningfully postulate prior probabilities referable 
to existent populations at risk to formalize the uncondi- 
tional character of the final statement which the rule 
endorses. We must do so with due regard to its content, 
viz.: the probability of wrongly rejecting the hypothesis 
Pa < P < Pp iS zero, since the rule does not allow us to 
reject it. We may then set out the argument in terms of 
the following symbols, € being positive: 





Conditional 


Hypothesis Prior Probability Uncertainty Safeguard 





(1) p < pa P, Pry mae 
(2) Pp = Pa P, Pf 5 x 
(3) Pa < P< Pb Ps Pr,=9 
(4) p = Pb P, Pf P % 
(5) p > Pb P. Pfs é; 





These hypotheses constitute an exclusive set of which 
our verdict can embody the acceptance of only one. 
Hence the addition rule applies, and our unconditional 
uncertainty safeguard is: 


Pr = P,. Pe. + Pe. Py. + Ps. Pe.g + Py. Peg + Py. Pr.s 
P,(a—é&,) + P..%+ Py.a+ P,. (a — €;) 
(i — P,) a — P, .&, — P, .€, 
on Pr: a. 


The prescription of such a rule presupposes 
two target values of p. These we can readily conceive 
in relation to standards of quality and to costing 
limits in an executive set-up, but the unconditional 
form the terminal statement assumes when we 
formulate a rule in this way embodies no relevant 
information other than the content of two types of 
conditional assertion. What the choice of a single 
acceptance-rejection criterion-score k accomplishes 
is that a statistical inspection plan then achieves 
its task, i.e., the test must lead to a decision to 
reject either H, or Hp». In fact, both hypotheses 
may be wrong; and the unconditional form of 
the assertion is realizable only because the test 
can never lead to a corresponding assertion, 
i.e., a statement of the form pa < p < pp. 


If we operate within the framework of a single 
hypothesis stated in the form p< pa or p > pr, 
and have defined our rejection criterion so that 
P< @ is the probability of rejecting it when true, 
we are free to limit our verdicts, as Fisher (1949)* 
does indeed prescribe, to the alternatives: hypothesis 
In the sense that 
Py<a@ is then the probability of erroneously 
making an allowably decisive assertion, we might 
admittedly say that Py< qm is the unconditional 
safeguard of our test procedure. We then evade the 
Neyman-Pearson error of the second kind by expos- 
ing ourselves to situations in which the overwhelming 


* “It should be noted that the null hypothesis is never proved or 
established but is possibly disproved in the course of experimentation. 
(‘The Design of Experiments”’’, 5th ed., 1949, p. 16.) 
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majority of our decisions will assign the verdict 
unproven to a false null hypothesis. We can indeed 
avoid doing so only by prescribing sample size 
with due regard to the Neyman-Pearson concept 
of test power; but any attempt to rehabilitate the 
Yule-Fisher significance test on such terms under- 
mines previous claims concerning the value of 
inference referable to small samples. 


That the distinction between conditional and 
unconditional is not so clear-cut, as we have 
provisionally assumed in the foregoing contribution, 
therefore emphasizes the importance of examining 
the advantages or drawbacks of any statistical 
procedure with due regard to the type of terminal 
statements which it can or cannot endorse. One 
puzzling feature of a test procedure which operates 
within the framework of a unique null hypothesis 
arises from the naive assumption that the appropriate 
form of the latter is in the words of Fisher (1949), 
the “hypothesis that the phenomenon to be demon- 
strated is in fact absent”’. Current practice interprets 
this to signify that the true difference with reference 
to treatment efficacy is zero. All the test can thus 
achieve is to assess the risk of accepting one treat- 
ment as better than another when they are equally 
good. Unless it is clear that there exists on this 
planet a body of persons actively interested in a rule 
of procedure with such terms of reference, it can 
lead only to statements which are either non- 
committal or irrelevant. 


The administrator concerned with allocation of 
costly resources will wish to know whether Treat- 
ment B is at least so much better than Treatment A. 
The physician anxious to invoke any means of 
possible benefit will ask the same sort of question 
but set his target value at a lower level. The manu- 
facturer eager to exploit a new prospect but alert 
to the danger of losing goodwill may wish to balance 
risks of two sorts by a dual test procedure such as 
the foregoing; and the research worker who invokes 
a Statistical device to validate his findings will do so 
because he rightly or wrongly believes in its relevance 
to some form of unconditional assertion about 
how much the efficacy of Treatment B exceeds 
that of Treatment A. 


Commonly, we shall not undertake a trial unless 
prior sources of information such as experiments 
in vitro or On animals have given us good reason 
to suppose that the new Treatment B is more 
efficacious than Treatment A. On that understanding, 
our practical interest in the outcome may be: 

(a) to ensure that the patient has the benefit 
of a new Treatment B if its efficacy may exceed that 
of Treatment A by as much as x,; 


(5) to avoid substituting Treatment B for Treat- 
ment A at a cost disproportionate to the benefit 
conferred unless the efficacy of Treatment B exceeds 
that of Treatment A by at least xy. 


We subsume both objectives in the type of state- 
ment with which interval estimation deals, namely 
Xy <d<x,: but we then relinquish the right to 
fix x, and x, in advance. Alternatively, we may 
adopt the dual test approach, e.g. we may equalize 
the risks (a= ) of rejecting Treatment B as a 
better substitute if at least x, is 10 per cent. more 
effective than Treatment A and of accepting it as a 
better substitute if no more than x, is 2 per cent. 
better than Treatment A in the same sense. J/nter 
alia, we may then ask: 

Is a sequential procedure based on such a choice 
preferable to the method of interval estimation? 


The question so stated is of topical interest, because 
it seemingly discloses the prospect of more rapid 
appraisal of treatment efficacy; but this hope may 
be illusory. We have first to suppose that the 
immediate assessment of efficacy is practicable 
pari passu with the assembly of data. We have also 
to suppose that the investigator can prescribe 
acceptable numerical values of x, and x . Aside 
from this, there is an as yet unresolved difficulty 
to face. The particular method for comparing two 
binomial parameters (pp, and pa) put forward by 
Wald (1947), takes as its criterion of relative efficacy 
the ratio u = pp (1 Pa) — Pall Pp). The 
pivotal hypotheses are then definable in terms of 
agreed values (uv) and u,) this ratio may attain. In 
the therapeutic trial, however, the relevant criterion 
(vide p. 219 infra) is the difference d = ps — Pa, 
and we cannot express u and wu, in terms of agreed 
alternative values d) and d, unless we know the 
true value of pa (or pp) in advance. 

If we can indeed prescribe target values x, and x, 
in numerical terms, we are free to state the problem 
in the framework of alternative risks and to design 
the trial with due regard to economy of material; 
but those responsible for the design of a trial are 
rarely, if ever, in the position to do so. The investi- 
gator who claims to pursue truth for its own sake 
will prefer a procedure leading to some form of 
unconditional assertion concerning a range of 
values within which d lies. To design a_ useful 
trial economically, he must then be able to specify 
how short the interval must be. This presupposes 
background knowledge outside the scope of statis- 
tical theory. Otherwise he can merely hope to 
make the interval x, — x, as small as the expenditure 
of available materials permits and to locate it 
with due regard to the end in view. We may then 
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explore the possibility of designing the trial in terms 
of treatment-group size to ensure that the length 
of the interval will not exceed an acceptable limit; 
but this presupposes some ulterior criterion of 
acceptable accuracy. 

In our view, interval estimation is the one available 
procedure which offers the prospect of statistical 
validification of judgments which are the chief 
concern of the research worker in the conduct 
of the clinical trial. Its neglect in the domain of 
medical statistics would, therefore, be difficult to 
explain, if it were not also true that the basic 
postulates involve: 

(i) a radical departure from the concept of point 
estimation which is traceable to the Legendre- 
Gauss theory of errors; 

(ii) an overdue reorientation of our views con- 
cerning the nature of statistical validification. 


Because the new approach associated with the 
term confidence interval is still novel in the context 
of the clinical trial, it will not be profitless if we 
set forth its implications between two schools of 
doctrine against the background of simple statistical 
models with numerical illustrations. 

The development of the theory is largely due to 
J. Neyman; but credit for an early explicit statement 
of a procedure appropriate to large samples in the 
domain of taxonomic scoring is due to Wilson (1927). 
Since Wilson’s contribution has _ received little 
recognition, it will not be out of place to quote his 
words from a later paper (Wilson, 1942): 

In 1927 I called attention to the fact that many 
statements about probability are highly elliptical and 
illustrated the matter by the simple case of a point- 
binomial universe with unknown probability p and 
observed value po in some sample. Using the admittedly 
rough estimate of probability based on the standard 
deviation one ordinarily writes: 

| An/ Pogo|n <p < po - Av/Pogo|n ; 

and states that the probability that the true value p in 
the universe lies between the limits given may be had 
from a probability-integral table entered with a normal 
deviation of A units. I urged that a better procedure 
would be to use for the standard deviation the value pq/n 
obtained from the unknown p of the universe which 
leads to: 








Po+tl2 /pogot + 0/4 po + 1/2 
i+t +e SOS TF8 
V/ Pogot + 0/4. 
1+? 


2. ONE APPROACH TO CONFIDENCE THEORY 

One may approach the method of interval 
estimation against the background of two types 
of model situation. The more direct accepts the 
factual restriction of the topic to a single though 


unknown universe of choice; and is therefore under 
no obligation to introduce the nebulous prior proba- 
bilities of Bayes’s theorem which rightly pertains 
only to situations admitting sampling in two stages, 
Alternatively, we may conceptualize it in terms 
of a Bayes’s situation to make their irrelevance 
more explicit. The model we shall then invoke will 
also help us both to materialize the relation of 
interval estimation to the new theory of test pro- 
cedure and to exhibit this relation as one to which 
the concept of prior probability, once elevated to 
a more commanding status in the theory of statistical 
inference, is also irrelevant. Our first series of 
models illustrates the direct approach. 


MOobDEL I (a).—We shall conceive that a lottery wheel has 
1,024 sectors labelled with scores x, (x + 1), (x + 2), 
(x + 3)... (x + 9), (x + 10), respectively allocated to 
1, 10, 45, 120, 210, 252, 210, 120, 45, 10, 1 sectors. Somuch 
we know; but we do not know the numerical value of x 
itself. At each spin we record as our score that of the 
sector opposite a fixed pointer. We now suppose that 
we spin the wheel forty times and record the mean score 
(M,.) of the 40-fold sample. Our problem is to define 
what we can legitimately assert about x. We shall first 
assemble available information relevant to the formal 
solution. 

The long-run mean value (M) of the score of any 
sample is, of course, (x + 5); and the terms of (4 + 3)'° 
define the unit sample distribution (u.s.d.) of the universe 
with variance o? = 2-5, whence that of the distribution 
of the 40-fold sample mean is: 

ae 
Om 40 16 . 
Thus 6,, = 0-25; and the error involved in a normal 
quadrature for the distribution of the sample means is 
trivial. We can thus say that: 

(a) the mean (Mx) of 2-5 per cent. of all samples in 

the long run will exceed M + 206m = M -++- 0-5; 

(b) the mean of 2-5 per cent. of all samples in the long 

run will be less than M — 26m = M — 0:5; 

(c) the mean of 95 per cent. of all samples will lie in 

the range M + 20m = M + 0°5. 

We now prescribe the following rule. We shall con- 
sistently disregard any values of M, if they lie outside 
the range M + 20,,, thus asserting of any sample within 
our experience that: 

M — 20m < My, < M + 20m be - (i) 
If we do follow this rule consistently, 95 per cent. of 
our assertions will be true in the long run, i.e., within the 
framework of an indefinitely protracted series of trials. 
Now the foregoing is equivalent to the alternative 
assertion: 


M,, + 26m > M> Mx, — 26m. .. ‘i (ii) 


Thus 95 per cent. of our assertions will also be correct 
if we say that M lies within the range of values so defined. 
We can set out the above reasoning in tabular form 
(opposite). 











Asomm . AA. 


Afra Fania aCaanania 





T 
I~ 
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Probability of 


} Probability of 





Event its Occurrence Equivalent Assertion its Truth 
Mx > M + 2am 0-025 M < Mx — 20m 0-025 
My, <M 2am 0-025 M M, + 20m 0-025 
M — 20m < My <M + 2om 0-95 Myx + 20m M < Mx — 20m 0-95 





Let us now suppose that a recorded result of a 40-fold 
spin is that the mean score is 6:3, and that we have no 
other information at our disposal. If we proceed con- 
sistently within the framework of our rule, we shall 
say that we attach a 5 per cent. uncertainty safeguard 
to the assertion: M lies within the range 6-3 + 0-5 or 
5-8 to 6-8. Since M = x + 5 by definition, we can say 
with equal confidence that x itself lies within the range 
0-8 to 1-8 inclusive, assigning | as the correct value 
(at the 95 per cent. confidence level), if x is an integer. 
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Fic. 1.—Graphical representation of the two-sided confidence 
limits for the sample mean of a normal variate of known variance 
(Model I.a). 


Fig. 1 exhibits the argument based on our lottery 
wheel model within the range of values 5 < M < 9 and 
0 < x <4. For any value of M we deny the occurrence 
of all values of M, greater than M + 20,, or less than 
M — 206, with a probability of erroneous rejection 
approximately equal to 0-05. Thus 95 per cent. of all 
sample values of M,. will lie within the two parallel lines 
M, =M-+ 20m and M, = M-—20,,. There will 
correspond to any observed value of M, (e.g., M, = 6-3) 
two values of M, (5-8 and 6-8 if M, = 6-3) where the 
line through M, parallel to the abscissa cuts these two 
lines. These two values will define the range of M 
consistent with the probability of error assigned to our 
denial of the limits of admissible values of M,. The 
specification of the probability of error, i.e., uncertainty 


safeguard, being in this context 5 per cent. presupposes 
that we follow the rule regardless of the structure of 
any particular sample. In one sense, therefore, we 
imply the existence of a rule stated in advance of the 
examination of the data. This pinpoints the reorientation 
referred to earlier. It is misleading to speak of statistical 
validification as a technique for weighing the evidence 
any single sample supplies. It would be more correct 
to say that statistical theory weighs the ways in which 
we propose to weigh evidence supplied by samples. 

In one respect, the foregoing model is highly artificial, 
i.e., we know in advance the numerical value of the 
variance (a) of the u.s.d. and hence that of Gm. When 
sampling from a putatively normal universe we rarely, 
if ever, have such knowledge; but we do know the 
distribution of the ratio (Mx — M) to the unbiassed 
sample estimate (sm) Of Om. The f-function specifies 
the sample distribution of the ratio of these two sample 
statistics. Hence we can get from the /-table upper and 
lower limits for M consistent with any assigned proba- 
bility of erroneous statement within the framework 
of repeated application of the rule; and we can do the 
same for o® itself by recourse to tables of the 7° 
distribution. 

If (as usually) we do not know the exact value of Om 
but only the estimate sm based on an r-fold sample, 
we can use the /-ratio with (r — 1) degrees of freedom: 
(Mx — M) 

Sm 7 

The column for (r — 1) degrees of freedom gives the 
value + fo.9; Of t such that P = 0-05 is the probability 
that ¢ lies outside these limits. We can thus say of 
95 per cent. samples that: 

Mx—M 

Sm 
Of 95 per cent. samples we can thus say that: 
(Mx — Sm. to.os;) < M < (Mx Sm « to.o5) « 

Whence with a 5 per cent. uncertainty safeguard we 
can assert that: 

M, Sins « Retin M < M, Suns fgg +s (i) 


t 


lies within the range +: fy.9; - 





Mope- I (6).—Our last model invokes the system of 
scoring distinguished as representative in the previous 
contribution of this series. Our concern is then commonly 
with the sample mean of a set of measurements or 
counts. Our next model illustrates the confidence 
approach to estimation in the domain of taxonomic 
scoring, as when we estimate the proportion of affected 
in a treatment group. We now suppose that our lottery 
wheel has 100 sectors on each of which the number of 
pips is either 0 or 1. We do not know the number [100g] 
of sectors which carry no pips, or the number [100p 

100 (1 — q)] of sectors which carry one pip. We spin 
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it one hundred times and record the mean score. Our 
problem is to define confidence limits of p, the proportion 
of sectors which carry one pip. We are here sampling 
in an infinite two-class universe, and successive terms 
of (q + p)'” define the frequencies of the observed 
proportionate (mean) score p, = 0,0-:01,0-02,0-°03 ... 
0:09, 1-0. The unknown variance of the distribution 
of p, is given by: 
.. 0-8 
Op 100° 

Throughout the range of prescribed values, from 
p =0-1 to p= 0-9 inclusive, the distribution of the 
observed proportionate score will be approximately 
normal. The range py = p + 20, will therefore define 
the 95 per cent. confidence level well enough for expository 
purposes. Since o, depends on p being zero when 
p = Oor p = I, the two boundaries of acceptable values 
of p, will not be parallel straight lines as in Fig. 1. 
They will meet at p = 0 and p = I, the upper being 
concave downwards, the lower being concave upwards, 
as in Fig. 2. The corresponding acceptable range of p 
values for any observed value of p, is unobtainable 
graphically, as before, by drawing a horizontal line 
parallel to the abscissa; but each limit is subsumed by 
the two roots of the quadratic: 


2 2 4 
(Po — p)* = 40," oe: 


If the observed mean value is 0-62, this becomes: 
25 (0-62 — p)? = p(| — p), 


*,104p? — 5-24p + 38-44 =0, 
p =~ 0°520r0-71. 


08- 
0-6- 


P, 4 
0-44 











0 O2 O4 06 O8 10 
p 


Fic. 2.—-Graphical representation of the two-sided confidence limits 
for the proportionate score referable to a large sample from a two-class 
universe (Model I.5). 

Note.—The dotted line in the terminal regions is to remind the reader 
that the normal approximation will hold good near the limits of 

the range only if the samples are very large. 


At the 20 (95 per cent.) confidence level, we shall therefore 
say that our lottery wheel has no more than 71 and no 
less than 52 sectors carrying one pip. Alternatively, 
we attach 0-05 as our uncertainty safeguard to the 
assertion. More generally, we may set our limits for 
admissible values of p, in the range p -t hd», so that the 
appropriate quadratic for r-fold samples is: 
h?p(l — p) 

r : 
Whence we obtain Wilson’s solution: 


(Po — p)* = h*o,? 





(2rpo + h?) , hy/h? + 4rpo1 — po) 


2(r + h?) 2(r + h?) «) 








If the size of the sample is small, we can define 
for any value of p limits which exclude a proportion 
equal to or less than 2-5 per cent. (or other agreed 
figure) at either end of the range by recourse to 
the tables of the binomial (Clopper and Pearson, 
1934; National Bureau of Standards, 1949). When 
we are comparing prophylactic or therapeutic 
measures with a low rate of attack or a high propor- 
tion of cures as the case may be, p is by definition 
near to zero, or to unity, in which event the condition 
implicit in (ii) will not hold good, unless the size 
of the sample is very large. Even so, the order 
of error is difficult to assess when we invoke a 
continuous distribution such as the normal as a 
computing device for quadrature in the domain 
of discrete score values. This will come into focus 
in the next model situation we shall examine. We 
shall then see more clearly why we must confine 
statements about the confidence level to the form 
Py< a or Pey<a when the variate is discrete, 
as is always true of the taxonomic method of 
scoring most commonly used in therapeutic or 
prophylactic trials. 


MobE- I (c).—It limits our horizon unduly, if we confine 
our interpretation of confidence limits to situations in 
which we can assume without appreciable error that 
our score distribution is approximately normal and the 
confidence interval itself expressible in terms of its 
variance. The latter has no relevance when the universe 
is rectangular; and we may therefore deepen our insight 
into the logic of confidence theory, if we now lay aside 
any preoccupation with the normal distribution. As 
an elementary example of the confidence approach to 
estimation in the rectangular universe, we may consider 
the following model. A lottery wheel has s sectors with 
consecutive scores from | to s, so that the proportion 
of sectors whose score value (x) exceeds m (< s) is 
(s — m) ~ s. We shall suppose that we spin it once 
and record x. Our first problem will be: what can we 
legitimately say about s? 


In the treatment of the foregoing model, we have 
side-stepped a limitation of interval estimation 
in the domain of discrete score values by assuming 














a I ad 
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a good enough normal fit. Unless we postulate a 
continuous distribution we cannot in fact assign 
an uncertainty safeguard (Py = a) or confidence 
level (1 — Pr) = (1 — &) to an admissible range of 
score values. The best we can assert is a statement 
of the form Py< a or Py< %, as when we use 
tables of the binomial in the situation of Mope I (4). 
One reason for this is that we can assign more than 
one value to m consistent with a fixed value P; = % 
for the rule to disregard all samples if x > m. 
If the score x is an integer, e.g. k or (k + 1), we 
can postulate an infinitude of values to which we 
can assign the probability «that x > m in the range 
k<m<k+l. 

It will be convenient to write P(x > k) for the 
probability that x exceeds kA and P(x =k) 
P(x > k — 1) for the probability that x is not less 
then k. If & is an integer there are (s — k) score 
values in the range x > k and (s — k + 1) in the 
range x > k, whence: 


s—k s—k-+I1 





P(x > k) and P(x >k) . (iii) 
ifk +1 > m-_>k so that m is either an improper 
fraction in the interval between & and (Ak + 1) or 
is the integer A itself, we may write m = (k + é€) 


and k =—(m— ee) for values of € in the range 








0<e< I. 
When € = 0, we may write: 
P(x > m) =~—" and P(x > m) at ‘ 
When ¢ > 0, 
P(x > m) = P(x >k) =~ : Futon sa ; 


P(x > m) = P(x > k) = 





s—k+1 s—m+1+e 
s s ‘ 
s—m s—m+ 1 
“.P(x>m) redo and P(x>m) Et .. (iv) 
s s 
We may subsume both (iii) and (iv) to cover the 
possibility that 7 may or may not be a whole number 


in the expressions: 


s—m ™ _ s—m+ 
P(x>m)>—— _ and P(x>m) A 8. : 
RY Ss 
Let us now set m = os, so that: 
Rule (i): P(x > as) l—«a 7 - .. (v) 
Rule (ii): P(x > as) > (1 a) + a > @ a).. (vi) 


AY 


The proportion of all samples whose score x 
exceeds as is thus no less than 100(1 %) per cent.; 
and the proportion of all samples whose score x is 
not less than as is greater than 100(1 — a) per cent. 


We may set out the implications of the foregoing 
statements as below: 





Probability Probability | Probability 








Event of its Equivalent of its of its 
Occurrence Assertion Truth Falsehood 
> as (I ~ |P,>U—)| Pp<o 
x aS %) Ss = t % f x 
x 
" . . => > . > 
x us 1 m s : P, a x) Py % 





We may express this by saying that our uncertainty 
safeguard for the assertion that s is less than 20x 
does not exceed 5 per cent. and our uncertainty 
safeguard for the assertion that s is at least 20x is 
less than 5 per cent. On the basis of observations 
of single spins with scores of x = 5 and x = 10, 
respectively, our assertions would thus take the 
following form, if we deem Py < as an acceptable 
level of uncertainty: 





P 
Rule x 5 x 10 f 
(i) s < 100 s — 200 0-05 
(ii) s — 101 s— 201 0-05 





To say that s < 100 in this context is to say that 
the upper confidence limit is 99. In terms of 
confidence limits we therefore write the above as: 





Upper Confidence Limit of s 


Rule — — —— P f 
x 5 x 10 
(i) 99 a 199 0-05 
(ii) 100 200 0-05 





Why we cannot express our confidence level in the 
form of an exact specification of the uncertainty 
safeguard of the form P; a will be clear if we 
state the foregoing rules in another way. In effect, 
Rule (i) signifies that we propose to disregard all 
samples if x < as, and Rule (ii) that we shall con- 
sistently disregard samples if x < as. We can geta 
backstage view of their implications, if we determine 
the proportion of excluded samples, i.e. the true 
uncertainty safeguard prescribed by each rule 
for values of s in the neighbourhood of 200, when 
a = 0-05 defines the upper limit of acceptability 
for our uncertainty safeguard and the sample score 
is x = 10. For s = 199, 200, and 201 respectively, 
as = 9-95, 10, and 10-05. 


By Rule (i) we disregard samples whose scores 
are 9, 10, and 10. The exact probabilities (Ps) of 
doing so are respectively: 

9 10 10 
1999 ° 200 °* 201° 
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By Rule (ii) we disregard samples whose scores 
are 9, 9, and 10 with probabilities: 
9 9 ; 10 
1999 * 200 ° 201° 
Thus the values of Py for s in the neighbourhood 
of 200 are: 














s | Rule (i) Rule (ii) 
199 — O-045 ~0-045 
200 0-05 0-045 
201 00497 0-0497 





Rule (i) will make Py = 0-05 = &, when s is an 
exact multiple of 20 = ~~ !; but otherwise Py < «&. 
Rule (ii) makes Py nearly equal to %, when s is an 
exact multiple of 20, but always less than «. 


We did not have to face the issue last discussed 
in the context of MopeL I (b), because we invoked 
a normal approximation for the summation of the 
terms of a truly discrete binomial sample distribu- 
tion. It is therefore instructive to re-examine the 
foregoing model situation on the assumption that the 
score x is a continuous rectangular variate. We 
may then interpret x >k as x > (k— 4) and 
x<k as x<(k+ 4). To accommodate all 
discrete values in the range x = | to x = s inclusive, 
we must accordingly extend the range of the con- 
tinuous distribution from x = 4 to x = (s+ $). 
On this understanding, our formal definition of 
the continuous rectangular distribution has merely 
to satisfy two conditions: 

(a) the probability /(x)dx that a score lies in the 
range x + 3dx is constant for all values of x, 
that is f(x)dx = K . dx: 

(5) the complete integral is numerically equal 
to unity, that is 

“sth 1 

K dx I K.s, and K oe 
The probabilities that the score lies in the range 
from | to k or beyond k are then expressible as: 

Tt 
P(x < k) =- dx=-, 
SJ4 S 

h 


and P(x > k) | — =. 


The above statement is exactly true of the discrete 
distribution, since P(x < k) = P(x < k + 3) if x is 
necessarily an integer. In effect, we make our 


range from 44x to s+ 44x, since 4x = 1; and 
we may neglect 4x if s is very large, as we must 
assume if we invoke the continuous distribution as 
a descriptive device. We shall then say that the 
range is from 0 to s, and admit fractional values of x 
consistent with the specification: 


1 /s k 
P(x>k)=-| dx=1--. 
Ss], s 
Accordingly, we now proceed on the assumption 
that x can have any real value in the range 0 to s, 
To make P(x > k) = | — & we then put k = sy, 
so that: 
P(x> sa)=1—«a@. 
Within the framework of the rule implicit in the 
procedure, we then assign (1 %) as the probability 
of correctly asserting that 
<= —, 
0 
When ~% = 0:05, this is equivalent to assigning 
Py — 0-05 to the assertion that s lies within the 
range from | to 20x. 


We have hitherto confined our attention to a 
procedure which entitles us to assign to s an upper 
confidence limit with an uncertainty safeguard 
Pr < a. If we wish to place it with a pre-assigned 
uncertainty safeguard in an interval ax > s > bx, 
the form of statement we may make is no longer 
unique. If we may justifiably proceed on the 
assumption that we can assign an exact uncertainty 
safeguard Py = y to what assertions we do make 
within the framework of a _ prescribed rule of 
procedure, i.e. that we may legitimately rely as 
above on the continuous distribution, we may write: 

“mm 
Pik << x < m) J | dx = = — k ; 
SJk S 





If we now write kK = Bs andm= as, 
P(Bs << x < Ms) a-s. 


We then assign an_ uncertainty safeguard 
Py = | — (% — 8) to the assertion: 
, ie. ty " x 
a = 
If 6 =0-025 and ~% — 0-975 so that Pr = 0-05, 
our final statement will thus be: 
40x 
j 40x >s>—. 
(i) x ) 39 
Now Py = 0-05 if 8 =0-01 and ~ =0:96. We 
are therefore entitled to assign Py = 0-05 as the 


uncertainty safeguard to the alternative assertion: 
25x 
24° 
When we write down P(x > sa) l % or 
P(Bs < x < as) = a — B, we state the probability 
of an event, i.e. the value of the unit score x, within 
the framework of the classical theory of probability 
and the convenient fiction that the distribution is 
continuous. Our assertion signifies: for the fixed 


(ii) 100x > s 





nanien»noeieith= irs i.@w 
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value s of the relevant parameter, P,.; is the proba- 
bility that the unit score will .lie in such and such 
a range. We have refrained from writing the 
probability we assign to the equivalent assertions 
in the notation P(s<xq-!)=1—«a or 
P(p-'x>s>a-'x)=a-— 8, lest we should 
hastily interpret them in terms of inverse prob- 
ability, i.e. as if we could legitimately say: for 
the fixed value x of the unit score, P;., is the 
probability that s will lie in the specified range. 
Such a form of words is inconsistent with Neyman’s 
theory. We must interpret a statement in the form 
P(ax > s > bx) = y as a summary of the long run 
result of consistently adopting one and the same 
rule of conduct regardless of the value (e.g. x = 5) 
the score x may have in any single trial, including 
the particular trial to which our specification of the 
interval estimate is referable. The formal statement 
of the rule will be adequate only if it explicitly 
specifies x as an unknown which may assume any 
value within its admissible range. We misinterpret 
it if we condense our verdict in such a form as: 
200 
i, 6 Soe ‘ 
p (200 > 5° >) 0-95. 

This is an act of self-deception into which we 
easily slide, if we write the formal identities: 


B(h+ dh)=x= ah, 








x  £ : ie dh 
h hadh * ?°-tastdh ’ 
Pih+dh>s>h)= us : 


We have now eliminated any reference to x as a 
variable in the expression on the left, and have 
obtained on the right what is seemingly the element 
of a probability distribution and satisfies the 
fundamental property of the latter, if we fix x and 
define the range of s from 4 = x to h = ow, so that, 


~ CoO 
xik-*.d&k—1 


This step, which leads to what Fisher calls a 
fiducial probability distribution, is admissible only 
if we can legitimately confine our statements to 
Situations in which x has one and the same value 
(e.g. x — 5). We could then write: 

-k k x 
Pis<k)=xih-*.d&h or as 
. 


If k = 20x, we thus obtain by a somewhat 
circuitous route a result already derived within the 
framework of the assumed continuous rectangular 
distribution, i.e. P(s < 20x) = 0-95. It follows that 
many results embodied in Fisher’s approach to 


interval estimation will tally with those to which 
the theory of confidence intervals leads us; and 
indeed many statisticians were at one time blind 
to what we now see to be a radical difference. 
If we conceive x . f(h)dh as an element of a proba- 
bility distribution, we have to regard 4 and x as 
independent to arrive at a numerical result consistent 
with confidence theory in the continuous domain; 
but we can do so only if we then treat x as a constant 
in the algebraic manipulation. We thus implicitly 
fix our interval in terms of a pre-assigned value of x 
to arrive at the specification of a probability depen- 
dent thereon; but this is inconsistent with the 
programme of Neyman’s theory, which specifies 
the interval in terms of a pre-assigned probability 
independent of the outcome of any single trial 
and hence of any pre-assigned value of x. 


3. RELATION OF ESTIMATION TO TEST PROCEDURE 


If we regard the problem of estimation as that of 
assigning a probability to the truth of the assertion 
that some unique definitive parameter of a homo- 
geneous universe lies between specified limits, we 
sidestep the disquieting dilemma with which the 
balance sheet of Bayes confronts us. Bayes’s 
theorem is essentially about a stratified universe, 
e.g. a bag in which some pennies with unlike faces 
are unbiassed and one penny (through a defect of 
minting) has the King’s head on both sides. In 
effect, it says: 

“To know how often I should be right in judging 
a coin taken from the bag to be the one defective 
coin after getting ten successive heads in a single 
10-fold toss, I must also know how many other 
coins the bag contains.” 


If we presume its relevance to a general theory of 
test procedure, one horn of the dilemma to which 
the theorem draws attention is that we rarely have 
such knowledge. The other is that all the coins may 
indeed be alike, and our only source of relevant 
information is the one coin we have tossed. The 
theory of confidence intervals sidesteps the dilemma 
by restricting our attention to all we can know in 
situations which disclose prior knowledge of neither 
sort. It is the writers’ belief that Neyman (1934) 
did not overstate the novelty or the importance 
of the viewpoint we have explored against the back- 
ground of the preceding models, when he declared: 

The solution of the problem which I described as 
confidence intervals has been sought by the greatest 
minds since the work of Bayes 150 years ago. Any 
recent book on the theory of probability includes large 
seciions concerning this problem. The present solution 
means, I think, not less than a revolution in the theory 
of statistics. 
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The model situations we shall examine below 
suggest an approach, alternative to and more 
sophisticated than that of the foregoing section, 
to clarify what is common to the domain of test 
procedure and to the domain of estimation. They 
are also of subsidiary interest inasmuch as they side- 
step the Bayes’s dilemma by a route superficially 
different from that we have so far followed. If we now 
explore their properties on that understanding, 
it is not because we believe Bayes’s theorem to 
have any necessary relevance to a theory of interval 
estimation. We shall examine the consequences of 
the assumption that it may have, only because of a 
widely prevalent belief that any adequate theory of 
statistical decision must come to terms with the 
concept of prior probability. 

In our previous communication, we examined a 
laboratory situation which precisely recalls the 
model appropriate to the issue Bayes propounds. 
We postulate a Drosophila culture known to contain 
two sorts of female flies, some with a sex-linked 
lethal gene and others normal. Our problem is to 
attach an uncertainty safeguard (P;) to a decision 
in favour of the hypothesis that a particular female 
fruitfly is of one or the other sort. In this set-up 
each hypothesis is referable to an existent sub- 
population at risk; and we can speak about the 
prior probability assignable to a hypothesis without 
danger of self deception. It is meaningful to do so, 
because we conceive the situation as one which 
offers us a tangible preliminary choice at random, 
i.e. the extraction of the particular fly from the 
culture so constituted; but the choice is indeed 
tangible only because we initially possess the 
information that the culture contains two sorts of 
flies. It would not be a real choice if we had to 
make the decision on the understanding that the 
females are of one sort only. In any acceptable 
sense of the term, the prior probability of one hypo- 
thesis is then zero and that of the other is unity. If we 
could correctly assign the appropriate value to each 
hypothesis there would be no problem to solve. 

To those whose approach to problems of cognition 
is essentially behaviouristic, it is therefore by no 
means obvious that the model situation appropriate 
to Bayes’s theorem has any relevance to circum- 
stances in which we have no opportunity of exercising 
the preliminary act of choice prescribed thereby; 
but there need be no dispute about the relevance 
of the prior probabilities to the prescription of a 
test procedure. Our examination of the mixed 
culture situtation shows that neither ignorance of the 
precise prior probabilities each referable to an 
existent population nor the unreality of the assump- 
tion that we necessarily carry out the enquiry in two 


stages need deter us from formulating a rule of 
decision with an assignable uncertainty safeguard. 
When we choose our rejection criterion to make 
the error of the first kind equal to the error of the 
second kind (% = 8), we arrive at the identity 
Py= «a for all values of the prior probabilities, 
and the relation « <Py< 8 for B> & is 
likewise true for all values of the prior probabilities, 
including the limiting case when they are respectively 
zero and unity. Thus the rule holds good, whether 
we can realistically interpret the decision against 
the background of Bayes’s model or in situations 
to which the two-stage sampling procedure implicit 
in the model has no factual relevance. 

This is the course we now propose to adopt with 
respect to interval estimation. Our new models 
will admit of a factual preliminary choice of the 
sub-universe from which we sample, with a view 
to exhibiting the irrelevance of such an assumption 
and its implications to the procedure of interval 
estimation. Indeed, we shall postulate situations 
to which the Bayes balance sheet is truly relevant. 
Our universe will be a stratified universe, and our 
problem to attach an acceptable uncertainty safe- 
guard to the assertion that a parameter definitive 
of the particular stratum from which we take a 
particular sample lies within a specified range. 


MobpeL II (a).—With this end in view, we shall suppose 
that someone spins forty times one of 100 lottery wheels 
chosen at random, recording the mean score (M,). 
Each such wheel has 1,024 sectors like the wheel of 
MopeEL I (a) with scores of x, (x + 1),(x +2)... 
(x + 9), (x + 10), allocated respectively to 1, 10,45... 
10, | sectors. We do not know the value of x associated 
with the particular wheel selected for the spin; but we 
do know however that each wheel is one of eleven types 
as follows: 





Type No. of Wheels Value of x 





I 1 

II 3 

Ill 10 

IV 17 0: 

Vv 20 

VI 7 

Vil 1 

Vill 
IX 
Xx 
XI 


=—=OOCMUNW = DIA 


SN Owh 
None —— 





In this model set-up, we may construct eleven admis- 
sible hypotheses about the value of x, and hence of the 
expected mean M = (x + 5). For each hypothesis, 
the standard deviation of the distribution of the observed 
mean (M,.) of the 40-fold spin is 6,, = 0-25, and to each 
hypothesis we can assign a prior probability in Bayes’s 
sense. If the observed mean score for the 40-fold spin 
is 6-3, as for Mopet I (a), the relevant information is as 
follows: 








Ch al 
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Hypothesis 


Prior Probability .M (M Mx) — Om 





I 0-01 5:5 3-2 
il 0-03 5-6 2-8 
il 0-10 5:7 2:4 
IV 0-17 | 5-8 — 30 
Vv | 0:20 6:1 0-8 
VI 0:07 6°3 0 
Vil | 0-12 6:5 4+ 0-8 
VILL 0-03 6-8 | + 2-0 
1X 0-08 6-9 + 2-4 
xX 0-02 7-0 2-8 





We shall now make the following rule. We shall 
reject some hypotheses as inadmissible and _ reserve 
judgment on others which we shall accordingly regard 
as admissible, applying to each hypothesis the same 
criterion of rejection, i.e. that it assigns to the deviation 
of the observed score (M, = 6-3) from the expected 
value (M) prescribed by the particular hypothesis a value 
numerically greater than 20,,. We then reject all hypo- 
theses, except IV—VIII inclusive, and are left with the 
assertion that M lies in the range 5-8—6-8, corresponding 
to values of x from 0-8 to 1-8. 

Our uncertainty safeguard for the rejection of every 
hypothesis when true is « = 0-05 since our rejection 
criterion is modular. That the unconditional uncertainty 
safeguard for the final verdict is also 0-05, as for MODEL 
I (a), we may make explicit as follows. We first remind 
ourselves that we can falsely reject only one hypothesis 
since only one can be true. Thus the unconditional 
probability of a false verdict is the unconditional proba- 
bility of falsely rejecting one or other of an exclusive 
set of hypotheses, and is therefore obtainable by recourse 
to the addition rule. If P;, is the prior probability that 
the particular hypothesis H is applicable to the situation, 
i.e. that we choose at random a wheel of type H to spin, 
the probability of falsely rejecting it is @P,; and by 
definition: 

h 11 

>, Py =1. 

h i 
The probability of making a false decision is the proba- 
bility of falsely rejecting any one of the hypotheses, i.e.: 


h=11 h=11 

Poa ra« Ol 0 > 2 re Z. 

h 1 h=1 
Thus & is our uncertainty safeguard to the assertion 
that M lies within the prescribed limits; and the prior 
probabilities of Bayes do not affect its value. We have 
thus arrived at exactly the same result as in the Mope I (a) 
situation, where we set the same uncertainty safeguard 


to the same range of admissible values of the parameter x 
in the unstratified universe of one and the same wheel. 


In the set-up of this Model, we regard any one of a 
limitless number of values p may have as a hypothesis 
referable to a conceivably, but not necessarily, 
existent population at risk. We thus interpret the 
process of estimation as a method of screening an 
exhaustive set of hypotheses as admissible or 
otherwise by successively applying to each a test 
prescribing the same probability of rejection if 


the hypothesis is indeed true. Our universe of 
hypotheses so conceived is a stratified universe, 
in which strata with the same definitive parameter P,, 
provisionally constitute an existent population at 
risk with an assignable finite prior probability in 
the jargon of Bayes’s theorem. Bayes’s prior 
probabilities (P;,) are then inherent in the initial 
formulation of the problem; but they do not appear 
in the solution. Consequently, we are free to assign 
to the prior probability of any single hypothesis 
any value in the range 0 to I consistent with the 
restriction that the sum of all the prior probabilities 
is unity. Whether there corresponds an existent 
population to a particular hypothesis in our 
fictitious stratified universe is therefore immaterial. 
That a particular hypothesis to which we apply the 
test corresponds to no existent population merely 
means that P;, = 0. To conceive the universe as 
unstratified is to assign P,— 1 to one stratum 
and P;, —0 to every other one. In this sense, 
MobE | is therefore a limiting case of Mope II. 


This way of looking at the problem of estimation 
makes the distinction between the domain of test 
decision and estimation less clear-cut than the 
alternative. If we interpret the procedure of estima- 
tion in terms of the model of this section, we can 
regard it as the performance of a battery of tests, 
but the score value which defines the criterion of 
rejection is different for each test and the decision 
to reject any one hypothesis or group of hypotheses 
does not prescribe acceptance of any other single 
hypothesis. We successively apply to each a test 
involving a new value of the score deviation (x — M) 
as the criterion which ensures the same probability 
of rejection for each hypothesis when true. If we 
assert that one group of hypotheses constitutes an 
admissible (in contradistinction to a residual group 
as an inadmissible) set, we then do so on the assump- 
tion that one of the former is identifiable with the 
correct one. 


MobE- II (c).—In our choice of a common criterion of 
rejection for the hypotheses sifted in the treatment of 
the foregoing model, we may assume, as we have assumed, 
a normal distribution of the mean score without incurring 
exceptionable error. Accordingly, we have defined the 
uncertainty safeguard of the prescribed rule by the 
identity Py = %; but any such formulation is strictly 
valid only in the fictitious domain of the continuous 
variate. It will therefore be profitable to examine a 
model situation in which we cannot legitimately invoke 
the normal, or any other continuous, distribution. 

In the homogeneous universe of Mopet I, we have 
seen that we can set an upper limit (Py < % or Py < &) 
to the uncertainty safeguard we attach to a confidence 
boundary in the domain of discrete score values; but 
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we cannot make an exact statement of the form Py =. 
Let us now therefore look at the problem raised by 
MobEz I (c) of SECTION 2 (above) as one of sampling in a 
stratified universe. We shall postulate as below an 
assemblage of one hundred lottery wheels of twelve 
types with consecutive scores | to m inclusive, if s = m 
is the number of sectors of a wheel of type H. Thus we 
have twelve hypotheses about s to explore, each referable 
to an existent population at risk; and we shall once 
more limit our decisions to rejection and reservation of 
judgment. We know the score x of a single spin without 
knowing the type of wheel to which it is referable. Our 
problem will be to assign a probability to an admissible 
set of hypotheses. 





Prior Probability 











Type of No. of No. of of Choice 
Wheel | Sectors Corresponding N 
(Hy) | (s,) Wheels (N,) (», +) 
| * 100 
1 5 13 0-13 
2 19 2 0-02 
3 20 1 0-01 
4 21 3 0-03 
5 39 7 0-07 
6 40 12 0-12 
7 99 3 0-03 
8 100 + 0-04 
9 101 9 0-09 
10 199 10 0-10 
11 200 15 0-15 
12 201 21 0-21 
Total 100 1-00 





For Mope- I (c) we formulated two rules: 
x 
a 
= with PF< 4M 


Rule (i):s <<=withPr< & 


Rule (ii): s < 


In effect, the first rule states that we reject the 
hypothesis-‘s = s, unless x > o&s,; and the second 
states that we reject the hypothesis s = s, unless 
x => as,. Thus our rejection criteria are: 

Rule (i): Reject if x < q&sp with Pr < qy 

Rule (ii): Reject if x < qmsp with Pr < gy 





| 
| Cri- x=5 x 10 
Hypo- | No. of | terion — 



















































































thesis |Sectors (as Verdict Verdict Verdict Verdict 
| GW | 0-055) Rule (i) Rule (ii) Rule (i) | Rule (ii) 
ied > | 0-25, “Open — Open Open _ Open 7 
2 | B 0-95 ~ Open Open Z Open | Open 
> tae 1 00 ~ Open | Open Open Open 
4 | 21 | 1-05 | Open | Open | Open | Open 
> | 1-95 ee Open _ Open ~~ Open Open 
6 | 40 2-00 Open | Open _ Open “| Open 
7 | 99 | 4-95 - Open Open Open | Open o 
8 | 100 | 5-00 REJECT} Open | Open | Open 








9 | 101 | 5-05 | REJECT REJECT| Open | Open 




















10 | 199 | 9-95 |REJECT|REJECT| Open | Open 
11 | 200 | 10-00 REJECT REJECT REJECT| Open — 
12 | 201 |10-05 REJECT REJECT | REJECT | REJECT. 


As below, we may then draw up a table of verdicts 
based on each of the foregoing rules for different 
experiments in which x = 5 and x = 10 respectively. 
In each case we assume that %=—0-05 is an 
acceptable level of uncertainty. 

The range of s values covered by open verdicts 
thus corresponds precisely with the outcome of 
our examination of Mopet I (c) for which the 
upper confidence limits are 99 and 199 respectively 
or xz=S ont x 10 with Py < 0-05 (Rule (i) ), 
or 100 and 200 respectively for x = 5 and x = 10 
with Py < 0-05 (Rule (ii)). The meaning of the 
correspondence is evident if we recall the meaning 
of the true conditional uncertainty safeguard (Py, ,) 
of hypothesis H in the domain of discrete score 
values. If our criterion of rejection is x < &s, we 
exclude only samples whose score value is x = os 
when qs itself is in integer. Thus Pr,,, the proportion 
of excluded score values when hypothesis H is true, 
is the ratio to s of the nearest integer not exceeding s 
and is always less than or equal to gm. If 0 < e,< | 
we may thus write: 


Pr. k= 2-Chs 


h=12 h=12 h=12 
i. Py > Ph : Py. h of > Ph, : > Ph -Eh, 
h=1 h=1 h=1 
h=12 
“ Py a > Py», ° Eh ° 
h=1 


Since we have chosen the rejection criterion so 
that Pr.,n< a, all values of ¢, must be zero or 
positive. Rule (ii) asserts that they are all positive, 
whence we obtain, as for MODEL I (c), 

Pr< aM. 

In this instance, some values of &, are positive 
when we apply Rule (i) and others zero. Thus 
Py < @ as before; but this is not inconsistent with 
the assertion P<, being included therein. A 
generalized Mope- II situation must take stock of 
the possibility that Pr. , = « for each wheel as would 
be true if we knew that the recorded score referred 
to a wheel of any one of types 3, 6, 8, 11 above. 
For each of these Py., = 0-05 and e, = 0, as will 
be seen by citing the values of Pr,, prescribed by 
our rejection criterion, viz.: 








Sh wa) Rule (i) Rule (ii) 
5 0-25 0-0000 0-0000 
19 0:95 0-0000 0-0000 
20 1-00 0-0500 0-0000 
21 1-05 0-0476 0:0476 
39 1-95 0-0256 0-0256 
40 2-00 0-0500 0-0250 
99 4-95 0-0404 0-0404 
100 5-00 0-0500 00404 
101 5-05 0-0495 0-0495 
199 9-95 0-0452 0-0452 
200 10-00 0-0500 0-0450 
201 10-05 0-0497 | 0-0497 
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In the treatment of Mopet I (c) we have already 
recognized one reason for regarding the concept 
of fiducial probability as an inadequate basis for a 
theory of statistical inference in that it restricts 
the field of discussion to continuous variates. 
Further consideration of the model situation we 
have last discussed gives us an opportunity for 
contrasting two theories of interval estimation from 
a different viewpoint. 

Fiducial probability takes its origin in concepts, 
some of which are common to the theory of confi- 
dence intervals; but Neyman’s development of the 
latter is inconsistent with Fisher’s interpretation 
of the former, unless there is some sense in which 
only one admissible pre-assigned rule of test pro- 
cedure is appropriate to one and the same situation. 
MobDELs I (c) and II (c) do indeed refer to a situation 
in which only one such rule invites our attention 
as relevant to the end in view; but we have not 
excluded the possibility that more than one might 
each have seemingly equal claims to commend it 
from a purely formal viewpoint. We shall now 
examine a situation in which this dilemma arises. 


Since a continuum is implicit in the concept 
of fiducial probability, we shall postulate a con- 
tinuous rectangular distribution over the range 
} to s+ 4, and examine what statements we may 
make when we draw two unit samples with scores 
x, and xy. Two, though not the only two, rules 
which we may formulate will serve our purpose 
well enough for heuristic purposes. We _ shall 
alternatively seek to prescribe an upper confidence 
limit to s with an uncertainty safeguard « by recourse 
to: 

(i) the maximum score Xm being Xm =x, if 

X, > Xo, and Xm = Xa if Xx. > Xx, 

(ii) the score sum X,5 = X,; + X2. 

The probability that x»,<m is the probability 
assignable to the joint occurrence that each score 
lies in the range from x = 0 to x = m inclusive, i.e.: 


9 9 

5 m : m- 

P(xm > m) — oO and Pl xem 4 m) | - =~ 
s” 


s* 
We wish our final assertion to take the form s < kx 
with a probability (1 — «) of correct assertion if we 
consistently follow the test procedure, whence we 
write: 


ne 
P(xm > m)=(1—a) >=a and 
a 


P(xm > s/o) =1—«a. 
Within the framework of this rule, we then assign 
as the uncertainty safeguard to the assertion: 


Xm 
‘<<. 


Vv % 


If we base our test procedure on x,, defined as above, 
the reader unfamiliar with the continuous rectangu- 
lar distribution will find it helpful first to make a 
simple chessboard diagram of the 2-fold discrete 
score sum distribution. It is then evident that we 
may express the probability that x,, lies in the 
range 2 to k if x = | is the origin of the unit score 
distribution in two ways: 





: oa L 

P(Xj.>k) a S i when k>s+l1 ; 
1—k(k—1 

P(Xy.>k) —— when k<s+1. 


For the continuous case we may represent our 
chessboard geometrically as a rectangle of area s* 
and the region in which all values x;, < k lie when 
k <s as a triangle of area $k”. Since we wish to 
associate a probability (1 — a) near unity to the 
truth of the assertion s< k~!.x, our concern will 
be with the smaller value of k (Fig. 3). 


B= p(xesar}= 0 SP =P (xrsaH-a) : 











x : s<% 
a a : 
1 m=sO s 
_ 
64 Accept only sample vaiues 
of x in this region 
ee Cee ee 
e™ 
x 
£ 4 = 3) . 
> oF : Reject all sample 
= 4° : values of x in 
— 3- & Beg , 
9 > : this region 
% 8 
ry) a 7 
5°) 9 
a + 
i T T T 
1) 50 100 iSO 


No. of sectors (s) 
Fic. 3.—Graphical representation of one-sided confidence limit for 
the number of score classes (s) of a rectangular universe when the 
score x refers to a single sample. 


For the continuous case we then write: 


k2 


2s” 


. P(x. > 8\/2aH) = 1-4. 


l— a, 
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Our second rule thus assigns @ as the uncertainty 
safeguard to the assertion 
- X12 
Ss <<, / 24 
We thus have two rules which assign different 
values to the upper confidence limit of s at one 
and the same confidence level (1 — @). In the 
strictly behaviourist formulation of confidence 
theory by Neyman this involves no inconsistency. 
One rule may seem better than another, if it incor- 
porates more “information”; but its use may have 
drawbacks which outweigh its merit on that account. 
In Fisher’s theory of interval estimation no such 
freedom of choice is admissible. The avowed inten- 
tion of the concept of fiducial probability is to 
express the intensity of legitimate conviction 
referable to a particular sample. If so, only one 
rule can be right, namely the rule which invokes 
all the information the sample supplies. Fisher 
speaks of a statistic, i.e. sample score, which has 
this property, as the sufficient one. 


The two statistics x, and x,. used in the foregoing 
situation will serve to illustrate what is and what 
is not a sufficient statistic if we now consider x, and x, 
as unit samples from a discrete rectangular universe 
with a range of scores from | to s inclusive. In 
deriving a rule on the basis of either we have 
suppressed any explicit specification of x, and x,. 
If our chosen statistic is defective it can be so only 
for that reason. We shall therefore ask: have we 
lost anything by withholding such information? 
We may answer this by considering the consequences 
of confining our attention in a sequence of trials 
to samples with some pre-assigned value of xX Or X45. 


Let us first suppose the pre-assigned value of x,, 
to be 3. The different sorts of double samples that 
are consistent with this value occur with equal 
frequencies and are specifiable as follows: (1,3), (2,3), 
(3,3), (3,2), (3,1). This set of equally frequent values 
is the same for all values of s consistent with the 
specification x,, — 3. Thus we have suppressed no 
information about s by scoring our sample in this 
way. Is the same true of x,,? 


Let us now consider samples with reference to 
which x,,= 8. This specification is consistent 
with any value s > 4, but this condition does not 
suffice to specify what individual values x, and x, 
have. If s = 4, the only double sample consistent 
with the specification x,, = 8 is (4,4). If s=5, 
three paired score values are allowable: (3,5), 
(4,4), (5,3). If s = 6 we may have: (2,6), (3,5), 


(4,4), (5,3), (6,2). Thus we can say more about s, 
if we know the individual score of x, and x, than 
we can if we know only the value of the insufficient 
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Statistic x,,; but the individual values of x, and x, 
tell us no more than we already know, if told the 
value of the sufficient statistic xm. 

We have now to state the definition of a sufficient 
statistic formally. To do so, we first remind ourselves 
that to each 2-fold sample specified in terms of the 
sequence of unit samples we may assign as above a 
bivariate score, e.g. (3,5) or (5,3). We may then 
speak of P,,.; as the unconditional probability 
that any sample has the bivariate score (x, , x.) 
and Py..m as the conditional probability that it 
has this score if x», is the maximum score. In the 
Same sense, we may label the unconditional proba- 
bility of a multivariate score (x, , X%2,%3... Xr) 
definitive of an r-fold sample as P,,.9.3...7.p for 
a distribution whose definitive parameter is p and 
Piy.2.3...)r.x aS its conditional probability when 
the sample statistic is x, if we can define it from our 
knowledge of x alone. We may then define by P,., 
the probability that the sample statistic will be x if 
the parameter is p and obtain by recourse to the 
product rule: 

Piy.9...").p Py.p- Piy.9...n.x 

We have now split into two factors the unconditional 
probability of getting the multivariate score which 
summarizes all the information the sample supplies, 
since its specification incorporates both the numerical 
values of each constituent unit sample and the order 
in which they turn up. One of these factors is 
independent of p if the statistic is sufficient, i.e. if 
(as is true of Py5.m) we can specify it without 
knowing the value of the universe parameter. We 
thus take as our formal criterion of a sufficient 
statistic the resolution of the probability of the 
multivariate score into two factors of which one 
does not contain p. 

By recourse to a simple chessboard lay-out of s* 
cells, with border scores from x=—I1 to x=s 
inclusive and x 1 to x = -s inclusive, we may 
amplify this breakdown with reference to x» and Xj» 
for the discrete rectangular universe. Each cell of 
the grid is referable to a unique pair of values 
x=x, and x=x,, but the same value of 
t = (x, + x.) or of xm = m is assignable to more 
than one cell if t= 2. Cells specified by x» =m 
lie on two sides of a square of m cells, there being 
(2m — 1) in all. If we write P,,.; for the probability 
that the sample records the unique pair of score 
values x, and x, when the number of sectors is s, 
and P,..m for the probability that x» — m when x 
has these two values on the same assumption, and 
P»,.; for the probability that x,, =m when s is 
the number of sectors, we thus see that 

l I 2m I 


ans 2 a 
12-5 a) 2.m ’ m.s r 
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Hence, in accordance with the product rule for 
conditional probabilities, 


Pio.s = Pio.m. Pm.s . 


We have thus split the probability assignable to the 
bivariate score x, , X, into two factors, one of which 
(P,2.m) is independent of s; and we might be tempted 
to think that we could specify a corresponding 
identity Pyo.s = Pyo.1.Pr.s, referable to the pro- 
babilities of getting the score sum ¢ when there are 
s factors and getting the particular value of the 
bivariate score if also (x, + x,) has the particular 
value ¢. Actually we cannot do so. All samples 
such that (x, + x.) = f lie in a diagonal of (t — 1) 
cells if s > (t — 1); and if we knew this we might 
write P,>.1 = (t— 1) 1, which is again independent 
of s. Thus, if s > 4, there will be four cells in the 
diagonal corresponding to ¢ = 5; but there will 
be only two cells in it if s = 3. Given ft, we can 
say that s > 31, for example s > 2 if t = 5, but we 
cannot say that s > (t 1). The mere fact that 
t= 5 is therefore insufficient to assign a unique 
value to the conditional probability Pj..;. 


In the same sense we may speak of the number (x) 
of successes in an r-fold sample from an infinite 
two-class universe as a Sufficient statistic of the 
parameter p. We may denote by P,;.9.5..»p the 
probability that the sample records successes and 
failures in a fixed order, there being r(x) different 


samples so distinguishable for the particular 
value x. Thus we may write: 
Poi93. rn .p = P*. G* 
| 
P(193. «ry. x ae 
(x) 
Px.p ex) P* .q* 5 


oe Px.p : 


One circumstance which gives the concept of 
sufficiency a peculiar importance vis d vis Fisher's 
approach to the problem of interval estimation is 
that it is not always possible to specify a sample 
by a statistic which is sufficient in his sense of the 
term. Since the fiducial probability distribution is in 
his formulation referable only to sufficient statistics, 
and only to sufficient statistics themselves referable 
to continuous distributions, the fiducial theory of 
interval estimation is of much more limited applica- 
tion on its own terms than is Neyman’s theory 
of confidence intervals.* 


P.193..n).p P3093. . 


* The following citation specifies the attitude of Fisher (1936) 
attitude to the concept of sufficiency: 

This consideration is vital to the fiducial type of argument, 
which. purports to infer exact statements of the probabilities that 
unknown hypothetical quantities, or that future observations, 
shall lie within assigned limits, on the basis of a body of observa- 
tional experience. No such process could be justified unless the 
relevant information latent in this experience were exhaustively 
mobilised and incorporated in our inference. 


Neyman’s preference for the expression inductive 
behaviour, in contradistinction to the traditional 
term inductive inference, forces on our attention a 
cleavage which admits of no compromise. It 
gets into focus an issue which must ultimately 
dictate our attitude to the place of statistical theory 
in scientific enquiry. It is not merely a revision of 
algebraic techniques or a matter of concern for 
the professional mathematician as such. It invites 
us to undertake a reorientation of our mental 
habits on a different plane. If we decide to adopt a 
consistently behaviourist viewpoint, the conse- 
quences will indeed be far more drastic than most 
of our contemporaries as yet foresee. 


4. INTERVAL ESTIMATION OF DIFFERENCES 


For reasons already set forth we assume that a 
prophylactic or therapeutic trial has as its end in 
view to assess what advantage will accrue from the 
substitution of one treatment, 7reatment (B), for 
another elsewhere referred to as the yardstick 
treatment, Treatment (A). Since the customary 
null hypothesis procedure takes no cognizance 
of the operational intention of the trial, it is not 
unimportant to make this explicit; and _ since 
alternative and more recently prescribed statistical 
procedures presuppose definition of a measure of 
relative efficacy, it is not trivial to be equally explicit 
about the measure we prefer to use. When the 
method of scoring treatment efficacy is taxonomic, 
as defined in the previous communication, we shall 
here assume provisionally that our main concern 
will be the additional number of cases likely to 
benefit from the substitution of Treatment B for 
Treatment A; and the unequivocal measure of 
this advantage is the difference between the para- 
meters pp and pz, definitive of proportionate success 
in the putative universes of which the treatment 
groups are respectively samples. When the method 
of scoring is representative, our corresponding 
criterion of operation advantage will be the absolute 
difference between the corresponding true mean 
scores, if the latter are expressible (e.g. duration of 
stay in hospital) in terms directly relevant to costing 
or humane considerations. Otherwise, as when the 
score is referable to a laboratory test, the crude 
difference (in contradistinction to a ratio or other 
measure) may have no special merit in terms of the 
end in view, and considerations of algebraic tracta- 
bility may dictate our preference for the measure we 
adopt. There may then be no objection to the use 
of normalizing score transformations which would 
otherwise obscure the end in view. 

In this context, we may recall that choice of the 
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method of scoring involves considerations other 
than statistical. In so far as we interpret the advan- 
tages of substituting one treatment for another in 
terms of the health or survival of the individual, 
the taxonomic method will commend itself as 
the one more relevant to the humane intention of 
the trial; but it is not always convenient nor is it 
equally appropriate to costing the substitution 
vis a vis allocation of scarce resources. Growth is 
especially difficult to assess in terms referable to 
individual performance; and we shall commonly 
rely on a group mean or other representative score 
when growth is the criterion of efficacy, as in a 
dietetic trial. In some situations the claims of costing 
considerations may manifestly conflict with those 
of the individual, as when a treatment which lowers 
the mean duration of stay in hospital involves 
peculiar risks to a minority of patients. 


The indisputably prior claim of taxonomic scoring 
in many types of trial has certain disadvantages from 
the view-point of the statistician. Any acceptable 
method of estimation in the taxonomic domain 
must prescribe the use of very large samples; and 
if it is true that we can formulate the distribution 
of a proportionate score difference referable to 
indefinitely large samples, we cannot as yet precisely 
specify how large they must be to justify our 
assurance that its use will not lead us astray. 


We have seen that confidence intervals are 
specifiable with reference to a discrete variate, 
such as the proportionate score of an r-fold sample 
from the two-class universe (Mop I (4)), only if we 
are content to express our uncertainty safeguard 
in the form Py< «. If the size of the sample is 
sufficiently large, we may invoke the normal 
approximation without sensible error. In that event, 
Our uncertainty safeguard may permissibly take 
the form Py = «@, and the solution of the problem 
will be as originally given by Wilson. When our 
concern is with a proportionate score difference, 
the assumption of normality does not suffice to 
assign an uncertainty safeguard Py = q to statements 
about the limits between which the true value lies; 
but we shall now see that we can still work within 
the limitations of an uncertainty safeguard Pr < «. 


We shall here use the following symbols. We 
denote two treatment group efficacies by pa and ps, 
their difference (the operational advantage of 
Treatment B) being Ma = (pp — pa). Our sample 
estimates of pz and ps» will be respectively pa.; 
and pp.s, the observed difference being Mzg.; = 
(Pb.s — Pa.s). The true variance of the difference 
distribution referable to the a-fold sample (Treatment 
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Group A) and to the b-fold sample (Treatment 
Group B) is: 
2 _ Pall — Pa) 4 pol — Pb) 

a b 


The normal approximation signifies that we may 
define as a normal variate of unit variance: 
Ma.s wie Ma 4 
Og 
In so far as this is legitimate we may place My, as 
below in a confidence interval at /o level, i.e. a 
5 per cent. uncertainty safeguard if h = 2: 


Ma.s — 20a< Ma< Ma.s + 204 


We can specify og precisely only if we know the 
numerical values of pg and pp, in which event we 
should also know the exact value of My. Our 
problem arises because we have no such knowledge. 
Within the framework of the assumption that we are 
dealing with a normal variate, we must therefore 
modify the form of our statement, as we can do in 
two ways. 





Ou 


- «+ (vii) 


First, we may make use of the fact that p(1 — p) 
is a maximum if p = $, whence og” cannot exceed 
the value it would have if pa = 4 = pp», in which 
event M,=—0. Accordingly, we shall define a 
statistic: 

l (a + b) 
4b 4ab 


We may then frame a confidence rule in the form: 
Ma.s — 264.5 << Ma< Ma.s + 204.5. . . (ix) 


Unless My = 0, and then only if pa = } = pp, the 
interval of length 4o4,; will be greater than the 
interval of length 40g and the appropriate uncertainty 
safeguard will accordingly be less, i.e. Pr < 0-05 
for the specification of a confidence interval defined 
by (vii). If what we may usefully call the operational 
level of the trial is about 50 per cent, i.e. (pp + pa) 
0-5, the uncertainty safeguard we can attach to a 
statement in terms of (ix) will not differ much from 
0-05. Otherwise, it may be much less. It is therefore 
instructive to examine the consequences of using 
it for the case of equal samples a=r=b. If 
da = (1 — pa) and q» = (1 — po), we then have: 


— Pada + Pb qb 
és 


] 
OO ee 2 a viii) 
1. P= + ( 


; I 
of 4.5 = x" 


- 


Od 


For expository purposes, we shall now take a back- 
stage view, i.e. we shall assume that we know the 
values Of pg and ps. To interpret our uncertainty 
safeguard precisely for a statement in the form 
prescribed by (ix), we can then restate our confidence 
interval in terms of og by the substitution hog 

204.55 so that h?( pada + Pogo) = 2. 


If Ma=0-1 
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(operational advantage 10 per cent.), we then have: 
(i) h = 2-01 and Py= 0-044, when pz = 0°45 
(operational level 50 per cent.); 
(ii) h = 2-83 and Py = 0-0046, when pz = 0-1 
(operational level 15 per cent.). 
Thus the upper limit we set to our uncertainty 
safeguard will be very conservative, in the sense that 
it may greatly understate the probability of erroneous 
assertion within the framework of the rule, when 
Pa and pp» lie near the limit of the range for which 
we may invoke the normal approximation with 
propriety for equal samples of less than 100. 


Since the rule prescribed by (ix) may thus lead 
us to be over-cautious, however large the sample 
we take, we may usefully explore an alternative 
procedure. We define an unbiassed estimate of oy 
based on sample values as: 








° Pa.s(l — Pa.s) Po. — Pb.s) 
Sd~ ° 
‘ a—1 + b— | 
For large samples it will be sufficient to write: 
a.s(l a.s) b.s(l .s) 
sa=" 45 os... a 
a b 


We may then define an interval by: 
Ma.s — 254<. Ma< Ma.s+ 2sa..... (xi) 


If the size (a and 4) of the samples we take is 
indefinitely large, sg wil! not differ sensibly from oy; 
and we can make an assertion in the form prescribed 
by (xi) with an uncertainty safeguard Py = q~ = 0-05. 
Actually, we have to deal with finite samples, and 
cannot therefore state with assurance that Py < 0-05 
since the sample statistic sy may be less or greater 
than og. All we can say with assurance is that we 
shall not often err greatly if our samples are large. 


If we wish to design a trial to ensure that the 
confidence interval will be of length C,, having no 
prior knowledge about the operational level as 
defined above, the best we can do is to use (ix), i.e. 


24 a+b 


404.5 : Ci ae 
V ab 


It is advantageous to use treatment groups of 
equal size, because the proportionate score difference 
distribution approaches normality as we increase, 
(a+ b) more rapidly when a = r = b, in which event: 


If the operational level of the trial is in the neighbour- 
hood of 50 per cent., our uncertainty safeguard will 
then be less, but not greatly less, than 0-05, for 
the assertion that Mz lies in a 10 per cent. interval 
(C, = 0-1). All we can say in advance is that we 


should never need to use larger samples to justify 
the assertion Mzg,.;— 0°:05< Ma< Mg.;+ 0°05 
with an uncertainty safeguard not greater than 0-05. 
However this procedure may prescribe samples of 
grossly excessive size if the operational level is near 
zero or near unity. Again, we may look at the issue 
heuristically. We suppose that pg = 0-1 and pp 

0-2, so that 20g =r *. If we set C, = 404 we then 
have r = 4C, ~*; and we shall require equal samples 
of 400. If we have at our disposal information results 
of a pilot trial, we shall have all the information 
relevant for a point estimate of the operational level; 
and we may be content to compromise on a speci- 
fication of r referable to (ix), and one referable to 
(xi). We thus arrive at the following conclusions: 

(i) if we wish to set our interval at a length sufficient 
to justify any useful assertion about the operational 
advantage of substituting Treatment B for Treatment A, 
we shall commonly require samples sufficiently large 
to justify recourse to the normal approximation; 

(ii) within the framework of normal quadrature as a 
computing device for summation of terms of a discrete 
distribution, and pending an approach to the problem 
by new methods which we hope to explore in a subse- 
quent communication, we must confine ourselves 
either to assertions which may be over-cautious or to 
assertions which may be unduly favourable to the 
new treatment. 


To get into focus the major issues raised by 
interval estimation in the domain of representative 
scoring we should bear in mind two considerations: 


(a) the assumption that the normal distribution 
of unit variance tallies closely with that of the unit 
sample score expressed in standard form is highly 
gratuitous; 


(b) even when such an assumption is grossly 
erroneous, the sample size need not be large to 
ensure a close normal fit for the distribution of the 
sample mean and that of the difference between the 
means of samples from different universes. The 
last statement is true even when the number of 
score classes of the u.s.d. is small. Thus the error 
in assigning the sum of terms in the range +- 20m by 
recourse to the normal integral with due regard to 
the half-interval correction is trivial for the 51 score- 
class distribution of the 10-fold sample mean from 
a six-class rectangular universe (e.g. unbiassed 
cubical die). 


It is necessary to emphasize the foregoing dis- 
tinction for two reasons: 

(i) we cannot make any general statement about 
how large an error we incur, if we invoke the /-distri- 
bution of the mean score of samples from a normal 
parent universe; 
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(ii) the distribution of the difference between 
mean scores of samples from a normal universe 
is not itself a f-variate. 


For any advantage we may derive in terms of 
economy of sample size by recourse to any method 
based on the /-distribution, we thus incur a liability 
to unspecifiable error arising from the necessary 
assumption that the relevant parent distribution is 
normal. For this reason, we propose to defer 
to a later contribution discussion concerning the 
merits of small sample methods. The ensuing 
remarks outline the theory of a procedure analogous 
to (xi) above, like the latter admittedly inexact and 
subject to the same limitations. 


We postulate two unit sample distributions 
(A and B) whose means are M, and M, and variances 
Oa", 0b". We then denote the sample means of 
sizeable a-fold samples from A, and b-fold samples 
from B as Mg.; and Mbp.;, distributed respectively 
with variance Om”. a = a~'og2 and Om?.» = b~1a5?. 


The variance of the distribution of the mean 
difference (M,.s) will then be og? = Om?.a + Om?.». 
In this context we may assume that each constituent 
sample exceeds fifty. For reasons stated we may 
then without compunction postulate as a normal 
score of unit variance: 

Mu. = Mz : 
Od 
At ho confidence level (Py = %), we may then 
define the interval: 
Ma.s — hoa < Ma < Ma.s + hoa . 


In practice, we shall not know the value of og. All 
we shall be able to specify is its unbiassed estimate 


based on unit sample scores Xg.n,Xb.n, VizZ.: 
1 n a 


Sa~ = aa—1) 2 (Xa.n — 


n=1 


. (xii) 


Ma . Pa + 


1 n=b 
—— >. (xb.n — Mb.s)? .. . (xiii) 
bb — 1) — 


If a and b are indefinitely large, we may thus define 
our confidence interval at ho level by: 
Ma.s — hsa < Ma < Mg.s + hsg 


We cannot assign the uncertainty safeguard Py = 
to an assertion of this form if referable to finite 
samples; but we may anticipate that gross error 
will be rare, if we assign to it Py ~ a when 
(a + b) > 200. 


. «+ (xiv) 


5. OPERATIONAL ADVANTAGE IN RECORDED TRIALS 


While the performance of a significance test which 
invokes a unique null hypothesis irrelevant to the 


humane intention of a trial is now a_ universal 
ritual in statistical assessment of therapeutic or of 
prophylactic trials, it is fair to record that some 
authors associate with an observed difference 
(My.s) the statistics sg defined by expression (x) 
above as an estimate of the range of the universe 
parameter M, deemed to be significant in accordance 
with the outcome of the test. As we have seen in 
SECTION 3 above, interval estimation subsumes a 
test procedure embracing any conceivably relevant 
hypothesis. Hence the performance of a significance 
test as a preliminary to specification of the estimated 
range of a difference is an entirely redundant 
undertaking which emphasizes how little the concept 
of interval estimation has penetrated the field 
under discussion. 


In what follows, we shall use the published data 
of 24 trials to illustrate results of applying the 
Statistics specified in Expressions (ix) and (xi). 
Table I cites the source and reference number of 
each trial. Table II (opposite) cites the objectives 
and yardsticks of efficacy. Table III (overleaf) 
presents the analysis. 


TABLE I 
SOURCE AND DATE OF 24 TRIALS 














No Trial Source Date 
1 B.T. Malaria War Office* 1948 
2. B.T. Malaria War Office | 1948 
3 Gonorrhoea War Office 1948 
4  Gonorrhoea War Office 1948 
5 Tuberculosis 

(prophylactic) Aronson and Palmer 1946 
6 | Whooping cough | 
(prophylactic) | Medical Research Council | 1951 
7 | B.T. Malaria relapses | War Office | 1948 
8 | Tuberculosis | Medical Research Council | 1948 
9 | Tuberculosis Medical Research Council | 1948 
10 | Influenza (prophylactic) ‘| Francis, Salk, and 
Quilligan 1947 
11 | Gonorrhoea War Office 1948 
12 | Gonorrhoea | War Office 1948 
13 | Gonorrhoea War Office | 1948 
14 | Burns | Jackson, Lowbury, and | 
Topley 1951 
15 | Burns Jackson, Lowbury, and 
| Topley 1951 
16 | Burns | Jackson, Lowbury, and 
| Topley 1951 


Medical Research Council | 1950a 
Medical Research Council | 1950a 
Medical Research Council | 1950b 
Medical Research Council | 1950b 
Medical Research Council | 1950b 


17 | Measles (prevention) 
18 | Measles (attentuation) 
19 | Tuberculosis 
20 | Tuberculosis 
21 | Tuberculosis 


22 | Enteric Fever 1907 (a) | Greenwood 1935 
23 | Enteric Fever 1907 (b) | Greenwood 1935 
24 | Common Cold | Medical Research Council | 1944 





j 


* “Statistical Report on the Health of the Army.”’ H.M.S.O., London. 





In Table III, the entry 100pa.; as the percentage 
efficacy of the yardstick treatment is intended to 
give some indication of the operational level of the 
trial on the basis of available evidence. This is 
pertinent to both the main issues raised in the 
foregoing discussion, viz.: 
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TABLE II 
TREATMENT AND CRITERION OF EFFICACY IN 24 TRIALS 





Treatment 


Criterion of Efficacy 





No. Trial 


B 








1 | B.T. Malaria Mepacrine 


Quinine and pamaquin No relapse 











2 | B.T. Malaria Mepacrine 


Quinine and pamaquin Only one relapse among __ those 
expertencing at least one 











Sulphathiazole or 
sulphapyridine 


3 Gonorrhoea 


Sulphathiazole or 
sulphapyridine 














4 | Gonorrhoea 


§ | Tuberculosis (prophylactic) Unvaccinated 
6 Whooping cough (prophylactic) 


7 | B.T. Malaria relapses 














Unvaccinated 








Paludrine at | different levels 


Penicillin No recourse to further t treatment 





Permanent cure 








Penicillin 


Absence of disease 


No attack 


No relapse 





BCG vaccination 


Vv accinated | 





Quinine and pamaquin 




















8 Tuberculosis Bed-rest Streptomycin Clinical improvement 6 mths) 
9 | Tuberculosis Bed-rest Streptomycin Survival (i mths) 
10 | Influenza (prophylactic) Unvaccinated Vaccinated No attack 





11 | Gonorrhoea Sulphathiazole only 


12 | Gonorrhoea 





Sulphathiazole only 





13 | Gonorrhoea 


Sulphathiazole and HgOCN 


Sulphathiazole and HgOCN ‘Sulphathiazole and KMnO, 








14 Burns | No polymixin 





Sulphathiazole and KMNO, No recourse to “further treatment 
No recourse to - further treatment 


Nor recourse to further treatment 


No infection with Ps. + PYOC, yanea 





“Polymixin 








16 | Burns No polymixin 


Good g graft 


Polymixin 








Adult serum 


Adult serum 


17 Measles (prevention) 








18 | Measles (attenuation) 











Bed-rest 


Para- aminosali icylic acid 


19 Tuberculosis 


"20 Tuberculosis 








Globulin (at tw two levels) No attack | 


Modification of sy :ymptoms 


Globulin (at | two vo levels) 


Clinical improvement (6 mths) 





Pare- aminosalicy lic ‘acid 


Streptomycin 





Marked radiological improvement 
(6 mths) 

















21 | Tuberculosis Streptomycin 
22 | Enteric fever, 1907 (a) Uninoculated 
23 | Enteric fever, 1907 (b) Uninoculated 
24 | Common cold Cc ontrol 


Streptomycin with Para- Cc linical improvement (6 mths) 


aminosalicylic acid 





Inoculated “No attack 
Inoculated Survival 
Patulin Cured or improved (48 hrs) 





(i) the adequacy of the normal approximation; 

(ii) in what circumstances the interval specified 
in accordance with the rule of procedure embodied 
in Expression (ix) is likely to lead to an excessively 
cautious specification of the interval length assign- 
able in terms of the upper limit (~%) of the uncertainty 
safeguard (Py). 


For illustrative purposes we have hitherto defined 
the interval by / = 2 in alternative statements: 
Method I of (ix) above: 
Ma.s — hoag.; < Ma: 
Method II of (xi) above: 
Ma.s — htsa<=M < Ma.5+ hsa. 


Ma.s+ hoa.;, 


Table III exhibits the result of using each method 
for h = 1-64 and h = 1-96. In the limit, i.e. for 
indefinitely large samples, these respectively specify 


the uncertainty safeguards appropriate to Method II: 
Pr 0-05 0 4 
for the one-sided assertion 
Ma=> Ma.; — 1° 6Asa,, 
P>=0°10=@ 
for the two-sided assertion 
Ma. s l 648a " Ma > 
P; = 0-025 x 
for the one-sided assertion 
Ma = Ma. S l *96sa ; 
Py 0-05 OC 
for the corresponding two-sided assertion. 


Ma.s + 1°64sa; 


In practice we have to deal with finite samples; 
and we can merely assert Pr ~ a. In different 
background situations, referable to different 
unknown values of P, and Ps, the application of the 
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TABLE III 
ANALYSIS OF RESULTS OF 24 TRIALS 





) —— 

















| ' | 

| Size of Treatment Group | Interval based on h 1-64 Interval based on A 1-96 
Trial | —— | | 100p | 
A B Method I Method II Method I Method II os 
1 650 584 | 19-0.. 28-4 | 20-0 27-4 i-t.. 29-3 a a 28-1 66-0 
2 } 221 60 0-0.. 23-8 | 5-1 18-7 i 26-1 3-8 20-0 81-4 
3 764 251 7 3. 30-5 20-0 29-2 ie 31-7 19-1 30-1 61-9 
4 764 251 | 19-0.. 30-8 19-9 29-9 is... 32-0 18-9 30-9 56-0 
5 | 1,457 1,550 i ee 13-1 8-5 11-7 Ss... 13-7 8-2 12-0 87-3 
6 3,757 3,801 13°4.. 17-2 | 14-1 16-5 13-0... 17-6 | 13-9 16:7 80-8 
7 215 107 BP? ? ac srs | 19-2 35-6 | [os .. 39-0 17-5 37:3 53-0 
8 52 55 ae-S .. 52-3 21-5 51-3 17°4.. 55-4 18-6 54-2 32-7 
9 52 33 wo ss 40-3 | 9-8 39-0 oe 43-4 7-0 41-8 53-8 
10 7,615 10,328 0°-4..+ 2:2 0:2 1-6 0:-6..+ 2:4 0-1 1-7 91-9 
11 569 204 20:6.. 34-2 22:6 32-0 as. 33°3 | 21-6 33-0 62-9 
12 569 95 3°8.. 22-0 5:1 20-7 2°... 23:0 | 3-6 22:2 62-9 
13 95 204 4-2.. 24-6 6°4 22-4 + 26-6 4-9 23-9 35°8 
14 76 64 oy... 36-9 11-7 34°3 6°4.. 39-6 9-5 36-5 64°5 
15 76 64 19°3.. 8-5 18-7 7:9 22:0... + 11-2 21-3 t+ 10-5 38-2 
16 55 39 18-8.. 53-2 20-0 52-0 ie a 56-5 16-9 55-1 25°5 
17 215 212 17-6.. 34-4 18-0 33-2 16-1.. 35-1 16-5 34-7 42-8 
18 123 67 28-8.. 53-8 32-2 50-4 26°4.. 56-2 30-4 52-2 51-2 
19 52 59 7%... 38°8 8-2 38-2 4:6.. 41-8 5:2 41-2 32-7 
20 59 54 8-t .. 49-1 19-4 47-8 °t.. 52:1 16-6 50-6 22-0 
21 | 54 53 3-2..+ 28°6 | 0:3 25:1 ee 31-7 2:1 27:5 74:1 
22 220 430 33°4.. 47-0 34-7 45-7 ae 48-3 33-7 46:7 56°8 
23 | 220 430 it... & eS | 3-1 8-3 2°4..+ 13°8 2:6 8-8 94:1 
24 680 668 8°4.. 0-4 7-8 - 0-2 ae Jee 1-3 8-5 0-5 73-0 





rule to finite samples will lead to different risks of 
false assertion, some greater and some less than «: 
and this may lead us to an unduly optimistic assess- 
ment of the operational advantage. Our uncertainty 
safeguard takes the unequivocal form Py < q if 
we use Method I, but the length of the two-sided 
confidence interval 2/o4.; will be greater than the 
length of the confidence interval 2ha, based on the 
true value of o,, and the value of / which suffices 
to specify as ~% the upper limit of Py. As explained 
above, the penalty of making our assertion over- 
cautious in this sense will be less exacting if the 
true operational level of the test, elsewhere defined, 
is near 50 per cent. level. We cannot know the true 
operational level 50(pa+ ps); but our sample 
statistic 100pa.; suffices to give what indication 
the figures can yield. 


6. SUMMARY 


(1) The operational intention of a prophylactic 
or a therapeutic trial is to assess the advantage 
of substituting one treatment (Treatment B) for 
another (Treatment A). This is commonly expressible 
in the form: by how much does the success rate 
or measure for Treatment B exceed that for Treat- 
ment A? 


(2) In the theory of inference we adopt, we aim 
at making statements which are true within the 
restriction of an uncertainty safeguard (Py) specifying 
an acceptable small probability of false assertion. 
Within these terms of reference two forms of 


statistical procedure are available. Of these, the 
method of interval estimation, alone can supply 
a terminal statement of a suitable form. The 
complete statement in the general case will then be 
that Py < & for the assertion that the true difference 
(Ma) lies between two values calculable from the 
data supplied by the trial. 


(3) We may regard the procedure of interval 
estimation as the successive performance of a 
significance test to each of an infinitude of admissible 
hypotheses, including, of course, the conventional 
null hypothesis that the two treatments are of 
equal efficacy. Accordingly, the performance of 
a significance test referable to any such unique 
null hypothesis as a preliminary to estimation is 
redundant. 


(4) Certain difficulties still beset the specification 
of a wholly satisfactory procedure of estimation 
appropriate to prohylactic and therapeutic trials. 
In the domain of taxonomic scoring, which is 
commonly more convenient and more consistent 
with the humane intention of the trial, an available 
method for assigning confidence intervals to a 
difference is accurate as the size of the samples 
becomes indefinitely large. In the representative 
domain, a method with the same limitations is also 
available. In each case further investigation is 
requisite to assess risks associated with the use of 
such approximate formulae. 


(5) Objections to the use of approximate methods 
assume a less formidable aspect, when we give due 








con 
esti 
an 
an 
adv 
unl 


nul 
tag 
oul 
of 
col 
he! 
in 
to 
all 
of 
ev! 
as| 
ste 
ste 
cle 
th 
Wi 
in 
as 


tre 
fre 





oe ae: a a 








STATISTICAL THEORY OF PROPHYLACTIC AND THERAPEUTIC TRIALS—II 225 


consideration to the fact that the method of interval 
estimation can offer no prospect of locating within 
an interval of acceptable length associated with 
an acceptable uncertainty safeguard the operational 
advantage of substituting one treatment for another, 
unless we are indeed free to use large samples. 


ADDENDUM 


In replacing the appeal to the customary unique 
null hypothesis by the concept of operational advan- 
tage, we bring more clearly into focus an issue of 
outstanding importance with respect to the intention 
of any method of statistical assessment. When 
confronted with an interval estimate of the type 
here discussed, the administrator may well enquire 
in what circumstances he may regard it as relevant 
to policy. If the statistician replies that he has said 
all he can legitimately say about the relative merits 
of two treatment procedures on the basis of the 
evidence at his disposal, he may then refrain from 
asking how far the evidence at the disposal of the 
statistician is in fact relevant to unspecified circum- 
stances incident to the trial. Though there is a deep 
cleavage between opposing schools of current 
thought, theoretical statisticians with otherwise 
widely diverse views seem to make common cause 
in seeking interpretations consistent with two 
assumptions: 

(a) with proper precautions we may regard each 
treatment group as a sample taken randomwise 
from an infinite population: 


(b) on the same understanding, the same homo- 
geneous infinite population is the source of any 
other such sample chosen subsequently with the 
same precautions. 

The circumstances in which such assumptions are 
more or less relevant to the use we make of an inter- 
val estimate are not self-evident nor immune from 
legitimate scrutiny; and we hope to examine in a 
later communication the propriety of invoking 
Statistical theory as a basis for extrapolation beyond 
the limits of a clearly defined framework of 
repetition. 
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COHORT ANALYSIS OF FERTILITY 
IN ENGLAND AND WALES, 1939-50 


BY 


WALLIS TAYLOR 


Department of Medical Statistics, University of Birmingham, and Central Statistical Office, City of Birmingham 


(1) INTRODUCTION 


The relevance to social medicine of changes in the 
age-structure of a community needs little amplifica- 
tion. A previous communication (Taylor, 1951) 
demonstrated the importance of adequate provision 
for a community in which survival to a level pre- 
viously entitled ‘“‘old age’’ would be commonplace. 
Since mortality is so low in the middle age ranges, 
this communication focuses on the other end of 
life and seeks to interpret the recent rapid changes 
in the birth rate.with a view to appropriate provision 
of hospital and ancillary services, the need for which 
depends on the current number of maternities. 
Though the issue is of importance in other spheres, 
e.g. in the demand for educational services, it has 
special relevance to the National Health Service, 
if we reflect upon the cost of maternity and child 
welfare services before, during, and after birth. 


Since a falling birth rate does not necessarily 
mean a decline in primiparae (which of all parities 
mainly occupy hospital beds), it is essential to 
interpret a changing birth rate in terms of parities 
and size of family if the outcome is to be a basis for 
costing medical services. Before the 1939-45 war, 
the steadiness of the gross and net reproduction 
rates made the accurate forecasting of future births 
under different assumptions a matter of arithmetic. 
Dispute was then about assumptions rather than 
about method. Since 1939, the rapid fluctuation 
of the rates (Fig. 1) has made prognostication 
fruitless without recourse to more refined analysis. 
The need for new methods is imperative; and this 
paper sets out one such, the cohort method, new 
in this context although previously employed in 
medical research (vide infra). This method is then 
used to analyse recent fertility in England and 
Wales and to estimate its significance for replacement. 


In one of the very few recent analyses of fertility 
applicable to England and Wales, Hajnal (1947) 
asks the following questions: 
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Is the elimination of the complications due to marriage 

. all that demographic analysis can do? Is the 
next stage to resort to “sociological” explanation, such 
as that the increased fertility rates are due to “full 
employment”, ‘‘family allowances”, etc.? .. . 

If demographic analysis can reveal no regularity under- 
lying irregular fluctuations in fertility rates as large as 
those which have recently occurred, it would provide very 
little basis for a reasoned discussion of population trends. 
For all such discussion assumes that a reasonably orderly 
and smooth development of fertility rates may be 
expected. 


Hajnal’s questions imply a doubt widely current 
among well-informed persons, viz.: What relevance 
to the contemporary scene have the demographic 
studies undertaken in the inter-war period, when 
the methods exploited by R. Kuczynski, Enid Charles, 
and other demographers were adequate to demon- 
strate consistently falling fertility and its immediate 
effect on the age-composition of the population. 


A temporary reversal of the pre-war trend during 
the war and the immediate post-war years has en- 
couraged the idea that earlier work on the social 
agencies which determine fertility has little relevance 
to present conditions, and that forecasts suggested 
by the course of events during the inter-war period 
are invalid. 


This idea is due in part to a reason adumbrated 
in earlier remarks. The credentials of composite 
indices of fertility such as the gross and net reproduc- 
tion rates presume a relatively static situation; in a 
rapidly-changing situation the picture they disclose 
may be highly misleading. For instance, a newly- 
imposed and powerful incentive to earlier marriage 
may result in a sudden rise of such rates shortly 
followed by a spectacular fall during a period in 
which fertility* does not materially change. 

* Throughout this communication the term ‘“‘fertility’’ is used, as 
all demographers use it, to indicate the current rate of producing 
progeny. Biologists use the term with emphasis on the ability to 


produce progeny, a concept for which demographers use the term 
‘“‘fecundity”’. 
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Fic. 1.—Fertility and marriage rates, 


It is the writer’s aim to show that smooth trends 
did exist in the disturbed period from 1939 to 1949, 
and are apparent by recourse to analytical methods 
necessarily more intricate than previously required. 
Contrary to many opinions expressed in the recent 
past, we shall see that replacement is not happening 
in England and Wales, and that a further decline 
in fertility is indicated. Such is the cohort method.* 
The earliest publications illustrating cohort analysis 
seem to be those of Kermack, McKendrick, and 
McKinlay (1934) on mortality rates, and of Barclay 


* This is, of course, analogous to a Family Census, and en passant 
it is perhaps advisable to detail the reasons for not using the Family 
Census conducted by the Royal Commission on Population. The 
first is that the Census took place in 1946—before the birth boom. 
It is not possible to continue the analysis. The second is that the 
published volume (1950) has tables for Great Britain not split for 
age and parity. It is impossible to understand the situation without 
a complete analysis. The third is that the marriage populations used 
seem to be erroneous. Fourthly, the Census (a sample census) was 
voluntary, and of course excluded all data for mothers who had 
died before the Census date. 


10 IS '20 1925 ‘30 


England and Wales, 1868-1951. 
and Kermack (1937-8) on fertility; but these 
pioneer contributions are the more creditable 
against the background of inadequacies of source 
material then available. Until 1939, when the 
Registrar-General was first able to publish a break- 
down of births by maternal age, requisite data for 
application of the cohort method to fertility were 
not available. Karmel (1949) has apparently since 
prepared material suitable for a cohort analysis, 
but uses it to develop an annual measure—the 
index of current marriage fertility in Australia. 
In the U.S.A. a monograph by P. K. Whelpton has 
apparently substantially advanced the analysis of 
American fertility by cohort methods. Although the 
work is as yet unpublished, a review by Kiser (1952) 
describes a novel approach, overcoming the defects 
of less refined data than are available in Great 


*35 
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Britain by the use of “actual” and “hypothetical” 
cohorts. 


(2) THE COHORT METHOD 


In the domain of mortality or morbidity, the 
statistical procedure referred to as the cohort 
method is comparatively simple both in conception 
and in execution. Given a breakdown of deaths 
by age over a sufficiently long period, it is possible 
to assemble for all persons born in a given calendar 
year, i.e. of one and the same cohort, the death 
and survival rates at each year of life up to the age 
attained by the cohort at the latest date to which the 
Statistics are referable. For completed cohorts 
it is thus possible to make a life-table which sum- 
marizes the mean duration of life of persons subjected 
to the same initial handicaps, in terms of conditions 
prevailing at time of birth, in contradistinction to 
the customary composite life-table which embodies 
what would happen if current risks of death at each 
age remained constant throughout a generation. 
The constitution of such a cohort life-table is 
naturally more laborious than that of the alternative, 
but is otherwise straightforward. 

While the cohort life-table gives a much more 
precise picture of the changing hazards of health, 
it is evidently of limited utility, since it can be com- 
plete only for cohorts already extinguished at the 
current calendar year, and will, therefore, be most 
instructive only when our concern is with trends 
referable to a much earlier date. From this point of 
view, the cohort approach is of much greater 
utility in relation to our present theme because the 
period of fertile married life, especially under modern 
conditions, is quite a small fraction of a generation. 
On the other hand, its application involves special 
difficulties apart from the fact that a breakdown of 
births by maternal age or duration of marriage is 
available only in recent British statistics (since 1939) 
and is not as yet obtainable for most countries. 
For cohort analysis of mortality, we need to know 
only births and deaths by calendar year and age 
at death, in order to relate deaths at age x in calendar 
year ym back to birth year y,,— x. For cohort 
analysis of fertility we need to know births, marriages 
by age, and both calendar year and duration of 
marriage at birth. 

A special difficulty besetting the use of the cohort 
method in connection with fertility in contradistinc- 
tion to morbidity when the requisite data are 
indeed available arises from the fact that procreation 
is not, as is death, a unique event. To marriages in 
one and the same year there may and will commonly 
be more than one birth before the extinction of the 
fertility cohort, i.e. termination of the reproductive 


period. This, of course, makes the assembly of 
relevant data referable to one and the same cohort 
somewhat more laborious, even if we confine our 
attention to total births. If figures tabulated by 
parity as well as by duration of marriage for one 
and the same calendar year are available, we can 
also make, but with considerable additional labour, 
a more refined analysis, exhibiting how the propor- 
tion of first, second, etc., births is changing pari 
passu with the changing size of the completed family. 
Such refinement is useful for the reason already 
stated, i.e. that primiparae are subject to peculiar 
hazards, but it is also essential if we are to draw 
any legitimate conclusions about current trends 
on the basis of cohorts as yet unextinguished in the 
sense defined. 

Thus, if we investigate fertility over a short period, 
and the cohort has achieved substantial fertility 
at the durations investigated, any change in the 
rate may have a small effect measured as a propor- 
tion of a// the previous cohort births. By segmenting 
into parities, it may, however, be obvious that 
external events have stimulated fertility in, for 
example, families already having three children, 
so that a high proportion of such families, which 
may themselves be only a small proportion of a 
segmented cohort, have added a further birth. The 
importance of this knowledge is obvious if we wish 
to measure the relative efficacy and reaction of 
external stimulus to fertility. 

In the application of the cohort method to the 
study of fertility, our problem is for a further reason 
less simple than those which arise in the study of 
mortality. When our concern is with mortality 
experience, the year of birth of a constituent indi- 
vidual is a sufficient criterion for the identification 
of the cohort. When our concern is with fertility 
experience, we have to reckon with the fact that 
in one and the same calendar year individuals 
marry at different ages. Since age at marriage 
influences both total fertility and the fertility 
pattern, i.e. spacing of births, it will thus be necessary 
to follow separately not merely the history of all 
marriages referable to a particular calendar year, 
but all marriages referable to different age groups 
in the one calendar year. The reader may here ask 
whether it would be necessary to make this distinc- 
tion if we adopted the same criterion by identifying 
the constitution of the cohorts for problems of 
either type, i.e. if we define our fertility cohorts by 
the mother’s year of birth. The answer is that our 
special concern in this context is to evaluate the 
fertility trend obscured by agencies simultaneously 
and successively promoting both earlier marriage 
and delay of procreation. From this point of view, 
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the fertility experience of a woman born in a particu- 
lar calendar year tells us very little if we do not 
know her age at marriage, e.g. the fertility experience 
of a woman born in 1900, up to the end of her 
reproductive life (say 1945) will be different if she 
is married in 1930 from what it will be if she is 
not married until 1940. Similarly, the fertility 
experience of a woman married in 1940 at the age 
of 30 will be different from that of a woman married 
in the same year at the age of 20. 

A helpful metaphor to bring into focus the 
differences between indices such as the expectancy 
of life in the customary life-table and the gross 
reproduction rate on the one hand, and what 
emerges from the cohort approach on the other, 
is to liken the first to a snapshot of a procession 
and the second to a moving picture. If the procession 
is orderly and uniform, the latter may give little 
information beyond that disclosed by the former, 
and our acceptance of the customary life-table 
indices as informative measures of community 
health is consistent with experience when the 
community is not subject to major epidemics or 
calamities. If we imagine the occurrence of some 
such great catastrophe, the inadequacies of the 
snapshot approach of the actuary will be evident. 
In the year of an epidemic, prevailing death rates 
rise over a wide range of age groups, and the so- 
called mean expectation of life, i.e. mean duration 
calculated on the basis of current rates, falls accord- 
ingly; but the effect is operative for each completed 
cohort during one year only and is diluted ac- 
cordingly by the prevailing risks in previous years. 
Consequently there is no distortion of the secular 
trends except in so far as the risk of death depends 
on the age reached by the cohort during the year 
of epidemic. The situation which is our concern in 
this investigation is essentially an epidemic situation 
in which for comparable, though more diverse, 
reasons the snapshot approach is wholly misleading. 
We have to deal with a period in which two variables 
distort the picture. There have been large fluctua- 
tions of nuptiality in circumstances which have 
both hastened and delayed the assumption of 
the responsibilities of parenthood. 

The essential difference between the snapshot 
and the moving picture is felt in more than one 
medical situation. Since it would be true to say 
that hospital reports still publish figures of duration 
of stay based on current-year averages, it is not 
trivial to remark that only two indices of duration 
are meaningful. We might adopt the snapshot 
approach by taking a census on a given date and 
applying the conventional life-table method to 
interpret its meaning as an overall picture of current 


conditions; but the usefulness of this procedure, 
not as yet applied to the writer’s knowledge, is 
open to question because the figures available for 
any one census day, being small, are liable to a 
large sampling error. In practice, therefore, we 
have no option in choosing an appropriate procedure, 
i.e. we must base any figures for duration of stay 
on patients already discharged, and therefore to 
figures of duration referable to admissions not all 
necessarily in the same calendar year. In short, 
circumstances force us to adopt what is in principle 
the cohort method; but we then do so less because 
hospital populations are subject to violent fluctuation 
than because the snapshot approach would be 
inconvenient and the disadvantages of the cohort 
approach vis ad vis the end in view, i.e. a statement 
referable to current conditions, are trivial in most 
situations.* With regard to duration of stay in 
hospital one may say that the snapshot approach 
duplicates and exaggerates all the disadvantages 
of crude birth and death rates. 

It may help to clarify the need for a new approach 
to an evaluation of current trends of fertility if we 
schematically contrast situations in which the 
incidence of marriages and births, singly or simul- 
taneously, may alter without any concurrent change 
in fertility as exhibited in Tables I — III below. 

Table I exhibits the fertility experience of a 
hypothetical static population with fixed size of the 
completed family throughout the period covered, 
the only variable to the total number of births being 
a temporary rise in the marriage rate followed by a 
compensatory temporary fall below the former 
stable level to which it eventually returns. For 
illustrative purposes, we assume that all marriages 
produce 2-child families spaced in the same way, 
a first birth occurring in the second year of marriage 
and a second birth in the fifth year of marriage. 
Here there is evidently no difficulty in recognizing 
the significance of the relevant situation. 

Table II exhibits a hypothetical static population 
with fixed nuptiality. As before, fertility is also 
constant over the period covered and at the same 
level the spacing of the family being initially as 
in Table I. Here, however, we interpolate a tem- 
porary phase during which the second birth follows 
more rapidly after the first, the second birth then 
occurring in the third year of marriage. Thus we 
may distinguish three periods. In the first and in the 
last, all second births will be referable to one cohort 
but during the middle period they will be referable 
to different cohorts. Again it is not difficult to see 
what has really happened. 

* Mental hospitals constitute a special case since clearance for most 


other types of hospital cases does not involve a delay of more than 
18 months if we base our figures on discharge dates. 
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TABLE I 


CHANGE IN INCIDENCE OF MARRIAGE (BIRTHS 


BEING CONSTANT) 






















































































; Births 
Marriages —_ iammanieniay = 
First Second 
Corresponding date Corresponding date Total 
Number Year Phase Number of marriage of cohort Number of marriage of cohort 
100 ; a or ~~ Fs =n “| — o 
100 y+1 (stable) 100 y a — — 
100 y+2 100 y 1 — ome os 
100 y+ 3 100 y+2 — — —- 
100 yv+4 100 y 3 100 y 200 
100 y+5 100 y+4 100 y 1 200 
200 y+6 2 100 y+ a oe | »yp+2d 200 
200 y 7 (temporary rise) 200 y+6 100 y+ 3 300 
50 yv+8 3 ; 200 y+7 100 yv+4 300 
50 y 9 (secondary fall) 50 v+8 100 y 5 150 
50 y 10 50 y+9 200 y+6 250 
50 i 11 50 y+ 10 200 y+7 250 
100 y+12 4 50 y+ ll 50 yv+8 100 
100 y+ 13 (stable at yth year 100 y+ 12 50 y+9 150 
100 y+ 14 level) 100 y+ 13 50 y+ 10 150 
100 y+ 15 100 y+ 14 50 y+ Il 150 
100 y+ 16 100 y+ 15 100 y 12 200 
TABLE II 
CHANGE IN INCIDENCE OF BIRTHS (MARRIAGES BEING CONSTANT) 
° Births 
Marriages ee ee Se ee 
First Second 
Corresponding date Corresponding date Corresponding date Total 
Number Year Phase | Number of marriage Number of marriage Number of marriage 
of cohort of cohort of cohort 
100 y 1 sas —_ dis aa = oP ee? 
100 re 100 y ae ae 
100 y+2 100 y+ — — - — — 
100 y+3 100 v+2 —- — -— -- —~ 
100 y+4 100 y+ 3 100 v _ — 200 
100 y 5 100 yv+4 100 y+il1 — —- 200 
100 y+6 2 100 y+5 100 y+2 = — 200 
100 y+7 100 y+6 100 y+3 — — 200 
100 y+8 100 y 7 100 v+4 and 100 v+6 300 
— _ —— anesee — —| ee — ————— eee — 
100 y+9 3 100 y+8 100 y+ 5 and 100 y+7 300 
100 y+ 10 100 y+9 = ie 100 y+s 200 
100 y+ il 100 y+ 10 — - — = 100 
100 y+ 12 100 y+ 11 — — — — 100 
100 y+ 13 100 y+ 12 100 y+9 — — 200 
100 y+ 14 100 y 13 100 y+ 10 - — 200 
100 y+ 15 100 y 14 100 v+ 11 — — 200 
100 y 16 100 y 15 100 y+ 12 — — 200 





Within the framework of the same highly schematic 
assumptions common to the two foregoing situations, 
namely fixed fertility and a static population, we 
may simultaneously vary, as in Table III, both the 
nuptiality rate and the spacing of the family for a 
brief period, returning to a pre-existing stable 
level of both. Evidently the record of total births 
now discloses a much more blurred picture. From 
this we can discern the significance of changes in 
birthrate, with a view to disclosing the true change 
in fertility suspected to have occurred, only if we 
can separately evaluate the fertility of marriages 
referable to one calendar year. If this is true of a 
situation so highly simplified for heuristic purposes, 


it must be true a fortiori of any real situation we are 
likely to encounter, since a real population will not be 
static and the effect of any major change with 
respect to nuptiality, family spacing, or total 
fertility will be confused by minor fluctuations, 
especially in communities where family limitation 
has become progressively more common. 

The nature of the cohort method of exhibiting 
current trends of fertility and the reason why it is 
essential to adopt such a method in present circum- 
stances has thus been briefly indicated. It will suffice 
if we relegate to the Technical Appendix (p. 242) 
details of assembling the requisite data from the 
available official statistics. Here, for illustrative 
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TABLE III 
COINCIDENT CHANGES IN INCIDENCE OF MARRIAGE AND SECOND BIRTHS 
Births 
Marriages First Second 
———_ — - - - = —— Total 
Corresponding date Corresponding date Corresponding date 
Number Year Number of marriage Number of marriage Number of marriage 
of cohort of cohort of cohort 
100 y — — — — — 
100 y+ 100 y —_ — — 
100 y+2 100 y+ — — — — 
100 y 3 100 y+2 — — -—— 
100 y+4 100 y+ 3 100 y - - 200 
100 y 5 100 yv+4 100 y+1 - 200 
200 y 6 100 y+5 100 y+2 — . 200 
200 y 7 200 y+ 6 100 y+3 — - 300 
50 y+8 200 y+7 100 yv+4 and 200 yv+6 500 
50 y 9 50 y+8 100 y 5 and 200 y+7 350 
50 y 10 50 y+9 — — 50 y 8 100 
50 y+ 50 y+ 10 _ vs sand Mw 50 
100 y+ 2 50 y+ on - ma 50 
100 y 13 100 y+ 12 50 y 9 _ ~ 150 
100 y 14 100 y+ 13 50 y 10 — ~ 150 
100 y 15 100 y+ 14 50 y+ — — 150 
100 y 16 100 y+ 15 100 y 12 - - 200 
purposes, the values for one, the 1939 cohort, are difficulties disclosed by the heuristic approach 


given (Appendix Table A, p. 243). 


(3) CONTEMPORARY PICTURE 


The foregoing sections have briefly indicated the 
nature of the problem with which this communica- 
tion deals, and the need for more refined methods 
than have hitherto sufficed for basing an intelligent 
prognosis upon data which provoke us to ask: 
Do recent events force us to repudiate conclusions 
suggested by the experience of the inter-war years? 

What immediately follows states briefly what is 
essentially new about the fertility experience of the 
decade 1939-49, and how we propose to employ 
the cohort method outlined in Section 2 to an 
evaluation of the situation. 

In the two decades 1919-39, available data 
with regard to Britain as a whole point to the size 
of the completed family interpreted within the 
framework of the cohort approach as _ steadily 
decreasing without major fluctuations. Meanwhile 
the mean age of marriage and the proportion of 
married both remained at much the same level and 
in any case were likewise subject to no major 
fluctuation in Britain. From the introduction of 
conscription in 1938 we enter on a new period. 
Nuptiality rates undergo drastic fluctuations and 
annual total births within the next decade do like- 
wise. Though, for the reasons stated, we have no 
comprehensive numerical data before the Population 
Statistics Act (1938), we have very good reason 
for believing that the fertility pattern was also 
unstable, in the sense that any appropriate index of 
spacing of births would also have disclosed abrupt 
changes. Thus we have a situation in which all the 


leading up to Table III coexist with the concomitant 
difficulties arising from the fact that war itself 
entails temporary but considerable changes in the 
age-composition of the civilian population. Such 
then is the situation which has understandably 
provoked scepticism with reference to forecasts 
based on the experience of the inter-war years. Such 
also is the situation which we can expect to clarify 
only by the approach outlined in Sections | and 2. 
We shall first attempt (in Section 4 below) to 
bring into perspective the significance of the dis- 
putable immediate post-war birth boom. We shall 
then attempt (in Section 5) to evaluate current 
(i.e. 1939-50) fertility in terms of replacement by an 
index referable to the cohort procedure which we 
employ throughout the succeeding sections. It 
will then be appropriate in Section 6 to use the 
cohort method to disclose what is the size of the 
completed family referable to different ages at 
marriage. Finally, in Section 7, we shall review 
an issue first raised by Kuczynski in relation to the 
possible consequences of encouragement of early 
marriage under the Nazi régime, i.e. whether 
seemingly sustained lower level of age at marriage 
will suffice to restore fertility to replacement level. 


(4) EVALUATION OF THE PosT-WAR BIRTH BOOM 


The first duty of a summarizing index is to 
summarize. In this sense crude live birth, marriage, 
and reproduction rates are valuable indices in so far 
as they condense information relevant to a stable 
situation. A second and not subsidiary requirement 
is to disclose underlying trends in a changing 
situation. The obvious solution of averaging by one 
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method or another is rarely satisfactory. If a 
situation has changed, it is desirable to know 
the fact as early as possible, yet a short-term average 
is ephemeral in a period when indices are rapidly 
fluctuating and a long-term average is unrevealing. 
Though the 1947 birth boom was hailed as a 
substantial change in fertility habits, we shall now 
see that the change in family building accounts for a 
great part of the boom. Despite the fact that the 
1947 cohort is of somewhat higher fertility than in 
recent years, fertility has fallen ever since. Such is 
the argument we shall now seek to substantiate. 

























































Fig. | (p. 227) shows the great variations in both 
fertility and marriage indices that occurred during the 
period. With a falling family size more variation 
in fertility rates is to be expected, since first births, 
which follaw rapidly after marriage, become 
increasingly important, and the arithmetical ‘*damp- 
ing” of oscillations which the necessarily delayed 
higher parities may produce disappear. Any 
variation of marriage incidence will thus affect 
the crude live birth rate more than was previously 
true. 

To exhibit the fluctuations in annual fertility, 
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Fic. 3.—Cohort live birth rates by duration of marriage, arranged by cohort. 
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TABLE IV 


YEARLY PERCENTAGE CHANGE IN COHORT FERTILITY RATES 
(Mothers aged 21-25 years at marriage) 




















Cohort 
Year — easiisieeedmtaiaieteeaiaiciine 
1939 1940 1941 1942 1943 1944 1945 1946 1947 1948 1949 
First Parity | | 
1940 13 — — — — — — — — — — 
1941 - $1 + 4 | — — — —_ —— — — — 
1942 - 14 32 | +14 _ = = = ate = = 
1943 — 27 29 «COI 38 + 21 — — — — — — — 
1944 33 | 32 33 42 + 18 — — — — — 
1945 — 29 | — 40 40 | 44 56 0 — — — — — 
1946 0 + 37 + 34 + 19 15 44 + 24 — — - — 
1947 34 37 30 20 9 27 45 + 32 — ~— ~- 
1948 53 | 58 | - aa 59 58 54 53 62 17 — — 
1949 — 43 — 50 44 41 41 43 45 58 18 — 
1950 — — 30 33 40 38 39 41 44 - 56 14 
Second Parity | 

1941 +7 | — | = -_ _ = _ _ _ ~ — 
1942 ‘me i 4 155 — — — — — — — — -- 
1943 3 | + 14 188 — —_— — — — — — — 
1944 +12 | + 23 + 47 + 221 — — — — — — — 
1945 36 20 — 14 : 7 177 — — — - — — 
1946 + 43 + 29 + 24 + 22 30 221 — — — ae oa 
1947 22 14 | 0 + 10 + 15 29 271 — - - as 
1948 40 — 36 32 25 18 14 + 10 222 = — — 
1949 — 29 29 26 - 45 15 13 0 136 — ~- 

26 20 — 15 4 100 -- 


1950 | — _ 19 21 31 





we have related fertility rates to the year in which 
they occurred (Fig. 2). Thus each vertical column 
contains rates due to different cohorts, but represents 
the fertility rates of any one calendar year (each 
vertical box refers to rates produced by different 
cohorts). Fig. 3, where rates for the same cohort in 
successive years are vertically imposed, shows a much 
more regular appearance. The rapid fluctuations of 
Fig. 2, which is already standardized for fluctuation 
in marriage numbers, virtually disappear, and 
reasonable prognostication becomes possible.* 

It is now clear that external events affect not only 
marriage, but also fertility. Whereas marriage in 
Gt. Britain is committal in the sense that it cannot 
legally occur more than once without the intervention 
of a protracted interval, the births produced in one 
year do indeed affect future occurrences of a like 
nature in the same family. We may, therefore, 
speak meaningfully of births as postponed or 
expedited. Thus Fig. 2 shows that the high values 
of 1947 fell rapidly in 1948—a change which should 
be fairly equally distributed over all cohorts (Fig. 3). 

Table IV classifies the changes in cohort habits by 
year of occurrence for the most important age group 
by exhibiting the percentage fall or rise of succeeding 
years relative to the immediately preceding year. 
Algebraically the values tabulated are: 

Specific Fertility Rate of Specific Fertility Rate of | 

Cohort x Duration (n+ 1) Cohort 7 Duration n 

Specific Fertility Rate of Cohort y Duration» 


100 





* In Fig. 2 approximate values referring to complete cohorts for 
marriages before 1939 have been inserted. They are not relevant 
to Fig. 3 and do not appear there. 


Here, therefore, a calendrical change affecting all 
cohorts will appear as a horizontal change. Clearly 
a change occurred in all parities and all cohorts 
in the 1946-48 period. In 1946 many values are 
reversed in sign, and the negative values are much 
smaller than customary. Some values are positive 
in 1947, but after this date all except one are nega- 
tive.t It is clear, therefore, that 1946 and 1947 
showed a reversal of trends at a// durations of marriage 
in all cohorts. By 1948 this effect has disappeared 
and a more normal situation prevails. 


We now compare the cumulative effects of fertility. 
Tables V and VI (overleaf) show live birth rates 
classified by duration of marriage and cohort year. 
In these Tables the values for the same duration 
of each cohort appear on the same horizontal line. 
First are shown the cumulative rates (V) and after 
this the actual yearly rates (VI). The cumulative 
rates show how far in advance or in arrears is 
the fertility of any particular cohort at any time, 
and the yearly rates show in which years the changes 
which led to this situation took place. Thus Table V 
will show the effect and Table VI the incidence of 
changes in fertility. 

Two types of comparison are admissible. First 
we may postulate a norm and measure the difference 
between each cohort and this norm. Alternatively 
we may measure the differences between different 
cohorts. To represent the norm we take a simple 


t The final horizontal value on each line is the percentage change 
in the second year of marriage and, being heavily affected by its 
proximity to marriage, should be ignored in this argument. 
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TABLE V 
CUMULATIVE FERTILITY RATES (TOTAL BIRTHS) 
Duration Cohort 
of | Average 
Marriage | 1939 1940 1941 1942 1943 1944 1945 1946 1947 1948 1949 1950 
Age Group 16-20 
1 465 324 256 255 275 299 284 295 400 416 430 410 342 
2 748 602 521 525 S72 579 S77 645 763 777 760 —- 643 
3 978 843 728 744 779 810 853 911 1,051 1,042 — —_ 874 
4 1,187 1,046 919 924 998 1,057 1,077 1,139 1,287 —_ = —- 1,070 
5 1,368 1,242 1,073 1,146 1,247 1,260 1,269 1,334 — -- — 1,242 
6 1,547 1,400 1,279 1,382 1,433 1,428 1,429 — — -- -- -- 1,414 
7 1,691 1.620 1,484 1,550 1,596 1,568 — = a — — — 1,585 
8 1,873 1,816 1,626 1,681 1,720 a — os — —_ = 1,743 
9 2,027 1,954 1,735 1,792 — ae — — —_ — — — 1,877 
10 2,145 2,066 1,826 ae —- — —_— — — — -= — 2,012 
Age Group 21-25 
1 232 189 177 182 203 237 216 235 313 296 266 246 233 
2 464 406 396 419 463 497 505 575 617 585 530 — 496 
3 630 600 574 612 635 713 747 803 845 795 = = 695 
4 797 771 743 761 818 928 934 996 1,037 — — — 865 
5 948 937 874 947 1,019 1,090 1,093 1,162 — — = — 1,009 
6 1,100 1,067 1,050 1,137 1,167 1,230 1,229 —- — — — — 1,140 
7 1,217 1,241 1,215 1,271 1,299 1,344 --- — —— a= = — 1,264 
8 1,362 1,392 1,330 1,382 1,400 — _ _ a —- — -- 1,373 
9 1,484 1,497 1,415 1,471 = — a a —_— -- = = 1,467 
10 1,569 1,579 1,488 — — — == — — = -- = 1,545 
Age Group 26-30 
1 180 155 160 164 185 220 189 224 283 256 228 220 207 
2 380 337 355 374 421 455 461 535 559 §21 467 — 442 
3 517 510 521 552 581 657 670 743 760 710 — — 622 
4 664 666 674 689 754 844 831 916 924 = a — 774 
5 802 813 795 853 922 990 968 1,060 a — os -= 900 
6 938 930 945 1,005 1,048 1,106 1,079 — -- —_ a as 1,007 
7 1,044 1,077 1,080 1,118 1,160 1,195 — — -- a —- == 1,112 
8 1,166 1,199 1,175 1,206 1,238 — — — _ -— — — 1,197 
9 1,266 1,284 1,243 1,269 — — — — — — — — 1,265 
10 1,335 1,346 1,298 — — — — — — “= —_ — 1,326 
Age Group 31-35 
1 147 157 157 172 191 178 192 222 209 190 188 181 
2 350 313 337 348 378 394 417 454 450 422 390 = 387 
3 473 446 479 493 516 564 585 622 600 560 — — 534 
4 59] 568 607 609 653 705 716 755 719 — — —- 658 
5 700 683 707 727 773 806 818 861 — a — —— 759 
6 799 768 809 825 861 884 896 — — — — — 835 
7 872 851 891 894 926 937 — — — — — — 895 
8 938 914 946 940 966 — —_— — — = — a 941 
9 986 953 976 967 — — — — — — — — 970 
10 1,014 976 995 — — — — — — — — — 995 
Italics Average. 


average over the period. If, in fact, the 1947 cohort 
represents a new level of fertility likely to be followed 
by successive cohorts, the norm will not be meaning- 
ful, but as we have seen this is unlikely. 

For comparison of the alternative type, we assume 
that a cohort for which we have more data has 
reached a more stable position. We shall, therefore, 
compare the 1939 cohort with other cohorts. It 
should be noted that the averages of the lower 
durations here necessarily refer to larger numbers. 

Perhaps the most striking features of Tables V 
and VI are the high fertility and distinctive character 
of the 1939 sub-cohort of women married in the age 
group 16—20. It is different both from later cohorts 
married at the same age and from those married 
in the same year at different ages. The very high 
fertility in the first year of marriage has been suffi- 
cient to keep this cohort higher than the average 


throughout the whole period.* We may only speculate 
about the reason for this. For instance, it may have 
been due to the 1938 political situation, possibly to 
conscription in 1938 of the 21-year-old male age 
group, to relaxation from the 1938 war scare, 
and to belief in “peace in our time’. The highest 
incidence actually seems to have been in the third 
quarter of the year, but no returns are available 
for occurrences in months or quarters by age of 
mother. Certainly it was not due to the outbreak 
of war which occurred far too late in 1939 to have 
any effect upon the fertility of that year. 

It is also noticeable that in the 21-25 age group 
the first year of the 1939 cohort showed an extremely 
high rate of fertility. The same is not true of the 
older cohorts. It is interesting to see that the 1939 


* If the averages are re-calculated after the 1946-48 cohorts have 
experienced ten years of marriage this may not necessarily be true. 
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TABLE VI 
* YEARLY FERTILITY RATES (TOTAL BIRTHS) 
Duration Cohort 
of —- — Average 
Marriage 1939 1940 1941 1942 1943 1944 1945 1946 1947 1948 1949 1950 
Age Group 16-20 
1 465 324 256 255 275 299 284 295 400 416 430 410 342 
2 283 278 265 270 297 280 293 350 363 361 330 — 306 
3 230 241 207 219 | 207 231 276 266 288 265 — -- 243 
4 209 203 191 180 | 219 247 224 228 236 — —- _— 215 
5 181 196 154 222 249 203 192 195 — — — 199 
6 179 158 206 236 186 168 160 — _ — a= 185 
144 220 205 168 163 140 _ = — — — — 173 
8 182 196 142 131 124 —- — — == ~ _- —- 155 
ce) 154 138 109 111 — —_ — = —— —- ~- 128 
10 118 112 91 — — -- — = a -- — — 107 
Age Group 21-25 
1 232 189 177 182 203 237 216 235 313 296 266 246 233 
2 232 217 219 237 260 260 289 340 304 289 264 -- 265 
3 166 194 178 193 172 216 242 228 228 210 a — 203 
4 167 171 169 149 183 215 187 193 192 — — 181 
5 151 166 131 186 201 162 159 166 ~- — — — 165 
6 152 130 176 190 148 140 136 oa -- —- — — 153 
7 117 174 165 134 132 114 = — —— — —_ — |} 139 
8 145 151 115 111 101 — — a -- = -= — 125 
9 122 105 85 8&9 — — — —~ — — -- a= 100 
10 85 82 73 — — — — — os = — —- 80 
Age Group 26-30 
l | 180 155 160 164 185 220 189 224 283 | 256 228 220 205 
2 200 182 195 210 236 235 272 311 276 265 239 — 238 
3 137 173 166 178 160 202 209 208 201 189 — — 182 
4 147 156 153 137 173 187 161 173 164 — a “= 161 
5 | 138 147 121 164 168 146 137 144 — — — — 146 
6 136 117 150 152 126 116 lil — a — — oo 130 
7 106 147 135 113 112 8&9 a —_ o- — -— —— 117 
8 122 122 95 88 78 — —_— — — — -— — 101 
9 | 100 85 68 63 — — - | - — — —~ — 79 
10 69 62 55 -= — — —_ |;— — — — = 62 
Age Group 31-35 
l 174 147 157 157 172 191 178 192 222 2 190 188 181 
2 176 166 180 191 206 203 239 262 228 213 200 — 206 
3 123 133 142 145 138 170 168 168 150 138 — — 147 
4 118 122 128 116 137 141 131 133 119 _- — — 127 
5 109 115 100 118 120 101 102 106 — — — 109 
6 99 85 102 98 8&8 78 78 _— — —- —- — 90 
7 73 83 82 69 65 53 — — ~ = os + 71 
8 66 63 55 46 40 — -— = a = _- - 54 
9 48 39 30 27 — = — —- = — — -= 36 
10 28 23 19 — — —_— — — — — — — 23 
Italics Average. 


cohort keeps ahead in several age groups, thus 
suggesting that it may ultimately represent a maxi- 
mum for the period. For example, the 26-30 age 
group of the 1941 sub-cohort, which started lower 
than 1939, twice reached a higher value only to 
be surpassed by the 1939 cohort. It is true that the 
1939 cohort is quite often less fertile in comparison 
with the very recent cohorts during the earlier years 
of marriage, particularly for the higher age groups. 
It is clear, however, that in general 1947 represents 
a peak, and in several cases the most recent values 
are similar to those of 1939. 

It is now of interest to separate the cumulative 
values to see in which years cohorts differed from 
the average for the whole period covered and from 
the marriages of 1939. Table VI now shows with 
the same construction yearly total live birth rates, 
by age at marriage, duration of marriage, and 


cohort year. Apart from the 1939 cohort, which is a 
case sui generis, the cohorts fall again into two types: 
(i) pre-1946, 
(ii) 1946 and after. 

For all the pre-1946 cohorts have a period at the 
start when they are below the average for duration, 
followed by a period, usually of 2 years, when they 
are higher, which is followed by another low period 
which always includes 1949 and often 1948. The 
1946 and later cohorts are with one exception 
always higher than the average. This fact again 
clearly demonstrates the experience of 1946 and 1947 
as a temporary phase. All cohorts without exception 
are affected, and all cohorts having more than 
4 years’ duration recorded show a low value for 
succeeding fertility. In addition many show at 
least one year of preceding fertility lower than 
1939, 
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In discussing the cumulative rates, we saw that 
for the 1939 cohort married at the age of 16~20, 
the high initial year is followed by 6 years of low 
fertility; yet, as we have seen, the final value is, and 
in fact all cumulative values are, higher than the 
duration averages. The 1939 cohort married at the 
age of 21-25 behaves in a somewhat different way, 
and loses place for durations of less than 9 years. 
For the ninth year it is actually above average. We 
may thus say that the post-war boom had only a 
temporary effect on all cohorts of earlier date 
than 1946, and that subsequently they show no 
evidence of any new trend. We have little evidence 
from which to estimate the future fertility of the 
later cohorts (1946 and after), but they show less 
tendency to be abnormally high after 1947. Thus 
1947 seems to be a peak and not a new plateau, 
and subsequent fertility, though high, falls pro- 
portionately from the 1946 and 1947 level. 


(5) EXTRAPOLATION OF THE COHORT FERTILITY 
RATES AND A COHORT REPLACEMENT INDEX 
For two reasons it is not possible to determine 
fertility rates accurately beyond the tenth year of 
marriage in the period dealt with. First, the relevant 
Statistics originated in 1939 (the first full year)—so 
that only for the earliest cohort are data for a longer 
period available. Secondly, data published for 
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durations of marriage over 10 years are grouped in 
quinquennia. 

Accordingly, we shall attempt to extrapolate the 
known values with due regard to two criteria: 


(a) the fitted curve must have its maximum at the end 
of the reproductive period (approx. 45 years of age); 

(b) the fitted curve must incorporate relevant informa- 
tion with reference to fertility experience in the period 
covered. 

The result is best shown graphically, and in 
Fig. 4 fertility referable to each cohort for the 
group married at the age of 16-20 is plotted on 
the same graph. Similar graphs prepared for each 
age group show the same pattern. 


All attempts to fit a single curve to the entire 
10-year figures were unsuccessful in that they either 
showed a maximum inconsistent with (a) above, or 
distorted the known fertility pattern. However, 
taking each group by age at marriage separately, 
a quadratic parabola fitted to the average of the 
cumulative values of all cohorts at durations of 7, 8, 
9, and 10 years, satisfied both conditions. 


Although for those married young less than 
half their fecund married life is complete by the 
end of the first 10 years’ duration of marriage, 
it is clear that this covers by far the most substantial 
part of fertile married life. For those married older 
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Fic. 4.—Cumulative fertility rates for all cohorts married at the age of 16-20 years. 
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this is even more true. The influence of tendencies 
to smaller family size may accelerate this trend. 


REPLACEMENT INDEX.—It is now possible to con- 
struct an index appropriate to the current tempo 
of changing fertility and nuptiality. By multiplying 
the percentage married at each age during the 
period by the extrapolated number of live births 
which each age group will produce by the end of 
its fertile life we have a weighted Index of Marital 
Fertility (Table VII). This index is independent 
of shifts in numbers married, since it is based on 
proportions. It is strictly applicable only to the 
period under discussion, but the subsequent exten- 
sion of the cohorts, as more data are available, will 
permit a revaluation of the trend of fertility at 
yearly intervals. 
TABLE VII 
COHORT INDEX OF MARITAL REPLACEMENT 








Age at Total Births Marriages at 
Marriage (from Given Age (per Product 
extrapolation) cent. for period) 
16-20 3-309 19-7 65-1873 
21-25 2-259 46-6 105 - 2694 
26-30 1-605 19-3 30-9765 
31-35 1-006 7-7 77462 
36-40 0-450 4-2 1-8900 
41-45 0-075 2:5 0- 1875 
Total 100-0 211-2569 
Sex ratio 0-4852 Replacement index 1-025 


Whilst the index is at present based on an extra- 
polation of the known data, the relative steadiness 
of the cohorts is a measure of its value. The gross 
replacement rate so derived does not present an 
encouraging picture of replacement but substantiates 
the previous analysis, i.e. the experience of 1939-50 
does not register a reversal of fertility trends. The 
mean figure of 1-02 female children would be just 
above replacement if all women married. It is not 
sufficient to compensate for the failure of some 
women to marry, although being referable to the 
completed fertility of a// women married at existing 
mortality rates, the figure cited compensates for 
married women who die before reaching the end 
of fecund life. 

Fig. 4 shows that the 1947 and later cohorts are 
above the average in fertility, but the yearly decrease 
of total fertility from the peak in the first year of 
marriage is in general steeper than for earlier 
cohorts. There are, therefore, three admissible 
conclusions: 

(i) that the higher initial values of more recent 
years will be compensated by a decrease in 
later years. 

(ii) that their total fertility will eventually be 
higher, although the gradient at later durations 
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may not differ from that of the earlier cohort 
years at those durations. 

(iii) that fertility of the later cohorts (1947-49) 
will be above that of the earlier cohort values, 
with regard to both height and gradient. 

A reason for favouring conclusion (ii) is that such 

a projected average as the high fertility prescribed 
by our third assumption has never been experienced 
in the history of the period. Further, the rate 
of decrease at low durations is higher than in most 
previous cohorts, and seemingly compensates for 
the initial high level. The tendency since 1947 has 
been one of falling fertility; and this is most likely 
to be continued in marriages of higher duration, 
thus possibly causing a more than proportionate 
subsequent fall. It would therefore seem that active 
measures are necessary if the present generation of 
mothers are to replace themselves. 


(6) FAMILY SIZE 

From the cohort fertility rates, we can extract, 
for given durations of marriage, the size of family in 
terms of live births of current marriages, i.e. without 
reference to children by a previous marriage. Whilst 
this is of ulterior interest, it also makes possible 
an analysis of the progress of each cohort and an 
estimation of the present birth situation in terms of 
completed fertility. 

Table VIII (overleaf) exhibits the history of cohorts 
married during the most common age range (21-25) 
in the years 1939-50, at successive stages of the 
duration of marriage throughout the decennium; 
each entry exhibits the proportions of families of 
given size referable to a particular cohort after a 
specified duration of marriage. 

It is noticeable that only 5 per cent. of families 
even at high duration (e.g. after 10 years of marriage) 
have more than three children, and that approxi- 
mately 20 per cent. have none, the mean number of 
children at this duration being approximately 1-5. 
Whilst as yet we have no break-down by parity 
for the extrapolation (vide supra), it is unlikely that 
many of the 20 per cent. infertile marriages do 
in fact produce a live-born child, and we should here 
note that the very much smaller proportion of infer- 
tile marriages in the smaller earlier married cohort 
(16-20 not shown here) is in great part due to the 
large proportion of pre-nuptial conceptions. Indeed, 
these marriages have produced an average of 2-0 
children after 10 year’s marriage. The mean per- 
centage of infertile marriages at selected durations 
for the group aged 21-25 is as follows: 





Duration of Marriage (years) 2 4 6 8 10 


Percentage Infertile 54 35 27 23 2 
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TABLE VIII 
SIZE OF FAMILY (LIVE BIRTHS) 
Cohorts compared at Same Durations of Marriage (Mothers aged 21-25 at marriage) 















































Duration Number of Live Births by Present Marriage Duration Number of Live Births by Present Marriage 
of Cohort -—— of Cohort = ————. ——__ 
Marriage Marriage 6 and 
(yrs) 0 1 2 3 4 (yrs) 0 | 2 3 4 S| over 
1939 77°3 22:4 0-3 — — 1939 33-5 | 43-8 | 18-6 3:7 | 0-4 — — 
1940 81-5 18-3 0-2 — — 1940 33-6 | 44:1 18-7 3-2 | 0-4 — emp 
1941 82-6 17-3 0-1 -- ~= 1941 37-1 | 42-8 | 17-1 2:7 | 0-3 —- = 
1942 82-1 17-8 0-1 oo = 1942 32-7 | 44-8 | 19-1 3-0 | 0-4 = ‘agi 
1943 80-1 19-7 0-2 oo 5 1943 29-4 | 45-1 | 21-2 3:9 | 0-4 — ni 
1944 76-8 23-0 0-2 — — 1844 26-6 | 44-7 | 23-6 4-5|;0-6/; — sa 
1 1945 78-9 20-9 0-2 — — 1945 28-0 | 42:2 | 24-2} 4:9 | 0-6! 0:1 al 
1946 77-0 22:8 0-2 — _- 1946 25-3 | 41-9 26°4 5-4; 1-0; — ones 
1947 69-3 30-4 0-3 oe — | | 
1948 71-2 28-4 0-4 _- — os ————_—'—_—_|_____ 
1949 74-1 25-5 0-4 — 
1950 75-6 24-1 0-3 — — 1939 29-4 | 40-6 | 23-0 5:9 |} 1-0 | 0-1 = 
1940 29-8 | 41-6! 22-8 4:9; 0-8) 0-1 = 
1941 30-4 | 41-8 | 22-3 4-7 '0-7/| 0-1 — 
1939 57-6 39-0 3-2 0:2 = 6 1942 26:1 | 42:7 | 24-9 5:3 | 0-9 | 0-1 —_ 
1940 62-3 35-3 2°3 0-1 —_ 1943 26-0 | 41-0 | 25-7 6:2) 1-0] 0-1 — 
1941 62-7 35-5 1-8 — — 1944 24:0 | 40-1 | 27-7 6°8 | 1-3 | 0-1 —_ 
1942 60-5 37-5 2-0 a — 1945 25-7 | 37-7 | 27°6 7°4] 1-3 | O-3 -_ 
1943 56:5 41-0 2°3 0-1 — 
2 1944 53-5 43-9 2°5 0-1 — — — ——— + — > ————— ——_ |———_ _ —___ 
1945 52-8 44-6 2°5 0-1 oo 
1946 46-7 49-9 3°3 0-1 —- 1939 26-5 | 38-8 | 24-9 7-8 | 1-6 | 0-3 | 0-1 
1947 43-7 51-5 4-7 0-1 -- 1940 24-6 | 38-7 | 27-9 7-1 1-4 0-3 —_ 
1948 47-5 47-4 4-9 0-2 1941 25:7 | 38-8 | 26-9 7-0 | 1-3 | 0-3 = 
1949 51-8 43-7 4-3 0:2 — 7 1942 23:4 | 38-9 | 28-3 7-6 | 1-5 | 0-3 _ 
1943 24-0 | 36-6 | 28-8 8-5 | 1-7 | 0-4 _ 
teint 1944 22-4 | 36°4 | 29-9 8-8 | 2-1 | 0-4 -— 
1939 47-9 42-6 8-8 0-7 —- 
1940 49-2 42-8 7°5 0-5 — -— ————— —____, —___ ___ 
1941 50-4 42-9 6°4 0-3 — 
1942 48-0 43-9 7-7 0-4 — 1939 23-6 | 35:0 | 28-3 | 10-0 | 2-3 | 0-6 | 0-2 
1943 46-2 45-3 8-0 0-5 — 1940 21-3 | 35:0 | 31-6 9-4 | 2:2 | 0:4) 0-1 
3 1944 40-5 49-2 9-7 0-6 — 8 1941 23-7 | 35-6 | 29-2 9-0; 1-9 | 0-6 _ 
1945 38-5 50-0 10-7 0:8 — 1942 21-9 | 35-6 | 29-9 9-8 | 2-1 | 0-7 
1946 35-3 51-0 12-9 0-8 —- 1943 22:8 | 33:4 | 30°3 | 10-2 | 2-5 | 0°8 
1947 32-9 51-7 14-3 1-0 0-1 
1948 37-0 48-5 13-2 1-3 — -— —— << 
‘i fi 1939 21-7 | 31-7 | 30-3 | 12-0 | 3-1 | 0-9 | 0-3 
1939 39-6 44-2 14-1 2:0 0-1 1940 19-9 | 31-9 | 33-3 | 11-1 | 2-9 | O-7 | O-2 
1940 39-9 45-7 12-8 1-5 0-1 9 1941 22:7 | 32:9 | 30-6 | 10-4 | 2-5 | 0-9 - 
1941 42:1 44-0 12:5 ees | I 1942 20-9 | 32-8 | 31-1 | 11-3 | 2-8 | 1-1 — 
1942 41-0 44-4 13-1 1-4 0-1 
4 1943 37-4 46-2 14-6 1-7 0-1 a 
1944 31-0 48-8 17-9 2-1 | O-2 
1945 31-8 46-9 18-6 2-5 | 6-2 1939 20-8 | 29-5 | 30-9 | 13-2 | 3-9 | 1-2 | 0-5 
1946 29-0 47-0 20-9 2:8 0-3 10 1940 19-1 | 29-5 | 34-1 12-5 | 3-5 | 0-9 | 0-4 
1947 26:8 47-6 21-8 3°5 0-3 1941 22-0 | 30-6 | 31°4 | 11-6 | 3-1 | 1-3 - 








It is clear that by the end of 10 years the per- 
centage is changing very slowly. 

The form of Table VIII is useful to shed light on 
the recent and extremely disordered situation 
previously discussed. It is quite clear that the 1947 
cohort has relatively high fertility. It is also clear 
that the 1948, 1949, and 1950 cohorts have succes- 
sively fallen from this level. The latest information 
of the 1947 cohort is at 4 years’ duration of mar- 
riage, referable to fertility in the calendar year 
1950. Here the cohort has the lowest proportion 
of infertile marriages: 26-8 per cent. against 
42-1 per cent. at this duration for the lowest cohort, 
1941. The 1947 cohort is, however, only 2-2 per cent. 
lower than the 1946 cohort. The 1947 cohort has 


also the highest percentage of 2- and 3-child families 
at this duration. The 3-child families at this duration 
are a very small proportion, viz., 3-5 against 2-5 per 


cent. for 1945 and 2-8 per cent for 1946; but the 
percentage of 2-child families is high, although 
again not appreciably greater than for the 1946 
cohort, i.e. 21:8 against 20-9 per cent.; and for 
1-child families it is 47-6 against 47-0 per cent. 

It is here that the cohort system demonstrates 
its claim to prominence in fertility research. For it 
is Clear that the high rates of 1947 which misled 
many into claiming a reversal in the trend of 
fertility were largely due to a change in the incidence 
of births within marriages without any substantial 
increase in quantity. In so far as the 1947 cohort 
is at present of slightly higher fertility than the others, 
this is due to an increase in 2- and 3-child families. 
This tendency is, however, less outstanding in 1950 
than previously. 

For cohorts 1948, 1949, and 1950, we have 
successively less knowledge; but each later cohort 
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has successively lower fertility at each duration. At 
duration 3, the 1948 cohort is lower than the 1947 
cohort for the two significant sizes (1- and 2-child 
families). At duration 2, the 1949 cohort has fewer 
\-child families than any previous cohort since 1943. 
However, the 1949 cohort has a very few more 
2-child families than other cohorts before 1947. 
It seems that there is a slight tendency for the 
2-child family to be more popular, but this is not 
sufficient per se to achieve replacement, since for 
this an average of over two children per completed 
marriage is necessary. 

The 1950 cohort is well below its immediate 
predecessors and very little above the 1939 cohort, 
which, as we shall see later, is itself well below 
replacement level at later durations. If we now 
score these values according to relative size in each 
duration (Table IX) it is clear that some of the war 
years have still some lost ground to make up. Further 
they have been less successful in making up later 
parities than first parity births. Thus the relatively 
high number of births in 1951 and 1952 may be 
due in part to postponed higher parities. When the 
Registrar-General’s Reviews for these years are 
available, it may well be seen that the fall in the 

TABLE IX 


RELATIVE POSITION OF COHORTS (TABLE VIII VALUES) 
(Constant 12-point grading) 





Duration of Marriage (years) 
Cohort —— —— —— —— -|\———_ |—_—— ——_| Average 
1 2 3 a 5 6 7 8 9 ) 
First Parity 
1939 
1940 
1941 
1942 
1943 
1944 
1945 
1946 
1947 
1948 
1949 
1950 
Second Parity 
1939 , — 
1940 — 
1941 — 
1942 — 
1943 — 
1944 — 
1945  — 
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fertility of the 1951 and 1952 cohorts is even greater 
than the crude figures suggest. 

The 1941 cohort will serve to pinpoint this. Its 
first parity values are almost always least until the 
calendar year 1946, when they exceed the 1939 and 
1940 cohorts. For the second parity it gains one 
place (over 1939) in 1947, but is not here pre- 
eminent. For the third parity it is always lowest. The 
Same pattern can be seen in the 1942 cohort. Since 
the cohorts are aged 31-35 after 10 years of marriage 
a further increase in fertility is still possible. 

Fig. 5 shows the proportions of different family 
sizes for one cohort (1939). We see clearly that 
the effect of later age at marriage is to lower fertility. 
This relationship between age at marriage and 
fertility is discussed in the next section. 
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Fic. 5.—Family size at end of 10 years’ marriage, by age at"marriage, 
for 1939 cohorts. 


(7) AGE AT MARRIAGE AND FERTILITY 


In the 1930s disagreement existed between 
demographers who claimed that a partial reversal 
of the falling birth rate could be achieved by persuad- 
ing enough women to marry earlier, and those who 
claimed that a change of age at marriage would not 
necessarily change fertility habits. The protagonist 
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TABLE X 


CUMULATIVE LIVE 


BIRTH RATES (TOTAL BIRTHS) 


Older Age Groups compared with 16-20 Age Group at Marriage 























Cohort. . 1939 1940 1943 1945 
Age Group 16-20 21-25 | 26-30 | 16-20 | 21-25 26-30 | 16-20 21-25 2630 16-20 | 21-25 26-30 
1 100 49-9 | 38-7 100 58-3. | 47-8 | 100 73-8 | 67-3 | 100 | 761 665 
2 100 62:0 | 50-8 100 67-4 56-0 100 80-9 73-6 100 87-5 79-9 
3 100 64:4 | 52-9 100 71-2 60-5 100 81-5 74-6 100 87:6 78-5 
Duration 4 100 67-1 55-9 100 73-7 63-7 100 82:0 75-6 100 86°7 77-2 
of 5 100 69:3 58-6 100 75-4 65-5 100 81-7 73-9 100 86:1 76:3 
Marriage 6 100 71-1 60-6 100 76-2 66-4 100 81-4 73-1 - - 
(yrs) 7 100 72:0 61-7 100 76:6 66-5 100 81-4 72-7 ste 
8 100 72-7 62-2 100 76-6 66-0 — —_ nese acaa ‘ 
9 100 73-2 62-4 100 76-6 65-7 ; > a te : 
10 100 73-1 62-2 100 76-4 65-2 ; a , oe on 
of the first part was the Registrar-General (Statistics TABLE XI 
Kuczynski (1942). sien thle 
. ° lve Ss 
The issue has a special relevance to the war and = Cohort _ after 10 years’ Total Length 
. . weve a : . Age Duration of Fertility of Per 
immediate post-war years. During the period Group| Marriage Fecund Fertility cent, 
with which this study deals, external circumstances jt en ee. lo 
P : Mar- e e ea y 
induced a greater proportion of people to marry at — riage Number cent. Number cent. Years 
~ 7 f 
younger ages. In 1939, the most common age of 120 16-20 
spi arriage was 23. > ani ne nen fewer = ‘ inant : — a 
spinster marriage - It fell rapidly during Ee a a. leas - aan | tae 
the war, and is now 21. This means that a substan- 21-25 1-574 74-8 2-259 68-3 24 0-094 82 
. . . 26-30 1-3 63- -605 8-5 9 0-084 74 
tially greater number of women are marrying at 3) 3s §.005 $3.4 {Lon toa da 0-072 | 63 
ag =. is age g is F oF 36-40 0-436 20-7 0-436 13-2 9 0-048 42 
the age of 16-20. This age group is of special a | gan | sel eae 3 : ns - 


interest because fertility is highest among married 
women assignable to it. Ceteris paribus, and with 
due regard to the prevailing high level of nuptiality, 
a downward shift of the mean age of marriage 
involving an increase in the 16-20 age group of 
married women should result in the relationship 
of the 16-20 group with the other groups remaining 
unchanged. It would also signify an increased total 
fertility. Similar remarks apply to the next youngest 
group, those aged 21-25 at marriage. This is next 
highest in fertility and its proportionate size has 
increased. Table X shows that there has been a 
consistent fall of fertility in the younger age groups 
relative to the older age groups, whereas the argu- 
ments of the Registrar-General had led us to expect 
the situation to remain unchanged. 

We now compare the total fertility of groups 
married at different ages. One might expect greater 
fertility from those married younger, if only because 
they have a longer fecund married life. We shall 
therefore compare fertility when standardized for 
length of marriage. We shall do this first at the 


point where our exact knowledge ends, viz., after 
10 years’ duration of marriage. A similar compari- 
son of fertility by age at marriage for completed 
marriage is made possible by using the previous 
extrapolation. As both Table XI and Fig. 6 show, 
fertility is consistently lower with higher marriage 
age, not only in toto but per year of fecund marriage. 





(8) BIOLOGICAL AND DEMOGRAPHIC VIEWS OF 
FERTILITY 


The controversy raised in Section 7 has implica- 
tions which call for comment. Biologists and demo- 
graphers do not use the term “fertility” in the same 
sense. (see p. 226, footnote). Hence confusion is 
almost unavoidable if interpretation involves matters 
of judgment concerning the relative importance 
of social agencies and physiological processes or 
states. When the demographer asserts that fertility 
falls off steadily in successive quinquennia of the 
reproductive period (15-45), the statement merely 
expresses the fact that the mean number of children 
born to a fixed number of married women within 
these age groups declines accordingly. It carries 
with it no legitimate implication that we know 
why or in what circumstances. When a physiologist 
speaks of low fertility, he means specifically that 
ovulation is less frequent, that fertilizability of the 
ovum is less, or that the chances of survival in utero 
of the fertilized ovum are lower for reasons inherent 
in the normal reproductive cycle or attributable 
to particular external agencies, such as nutrition. 
On the other hand, any figures referable to the 
incidence of live births to married women of 
different ages in a community which practises 
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family limitation are open to interpretation on 
quite a different level. 

If the community mores prescribe a certain target 
level of family size, it is evident that women who 
marry younger will attain the target level earlier 
than women who marry later. Hence no figures 
which do not take into account duration of marriage 
have any presumptive bearing on whether the 
incidence of live births among married women at 
different ages is attributable to social pressures 
or to the various mechanisms with which biologists 
customarily associate the term fertility. 

In the two extreme age groups of reproductive 
life, it is plausible to assume that physiological 
differences in the broadest sense of the term, includ- 
ing differences with respect to genetic make-up, 
are of no mean importance; but we are not in fact 
able to delimit any age range about which we are 
able to make a definite assertion of this sort. The 


Live hirths, by age at marriage. 


intrinsic plausibility of any such assumption has 
no bearing whatsoever on the likelihood that a small 
shift in the mean age at marriage will have a uniquely 
predictable consequence without due regard to the 
social pressures Operative in a community in which 
some measure of birth control is universal. 

Such social pressures are evidently of different 
kinds at different social levels. Thus it is highly 
likely that early marriages in the social groups 
from which the professions largely recruit themselves 
will encourage motherhood as a whole-time career 
when delay of marriage would accommodate a 
smaller family to a woman’s continuance in another 
vocation. It is equally obvious that different social 
pressures operate in the wage-earning community, 
among women who customarily undertake work 
outside the home when their older children are able 
to assume some responsibility for care of the younger. 
It is indeed safe to presume that the net effect of 
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social pressures in different groups is different with 
respect both to the direction in which it affects 
fertility in the demographic sense of the term and 
to the extent to which it does so. It is not unlikely 
that a very big shift of the mean age of marriage 
would have foreseeable consequences, it is by no 
means admissible that our knowledge of the agencies 
determining the size of the completed family entitles 
us to express any opinion about the consequences 
of raising or lowering the mean age of marriage 
by two or three years above or below its present 
level in Britain. 


SUMMARY 
(1) Fertility indices computed from annual data 
are misleading if the end in view is the assessment 
of fertility under unstable conditions. This com- 
munication describes an alternative method based 
on cohorts of married women and sets forth its 
advantages. 


(2) The method is used to show that the high 
birth rates of 1946-48 were due to a complex of 
causes unrelated to any substantial increase in 
fertility, and that fertility has indeed fallen annually 
since that period. 


(3) An estimate of completed fertility is made 
from which a new index of fertility is computed. 
The use of this index makes it possible to show that 
in England and Wales fertility is now below replace- 
ment level. Moreover fertility is not yet stable, but 
is still falling. 


(4) Fertility, expressed as total fertility or as 
fertility per year of fecund married life, is inversely 
related to age of marriage. Nevertheless the recent 
decrease in age at marriage has not been associated 
with the increase in mean family size anticipated 
by the Registrar-General in 1938. 


(5) The proportions of families of different sizes 
after 10 years of marriage are estimated according 
to age at marriage for women married in England 
and Wales in the period 1939-50. 


Acknowledgments are due to Dr. Enid Charles and 
Professor Lancelot Hogben, F.R.S., for advice and 
criticism. 
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TECHNICAL APPENDIX 


CALCULATION OF COHORT AGE-SPECIFIC LEGITIMATE LIVE BIRTH RATES BY PARITY 
FOR ENGLAND AND WALES, 1939-50 


All the material for the rates here computed is available 
in the Registrar-General’s Annual Statistical Review, 
Civil Tables, vols 1938-50, and all reference to tables 
are given in the form in which they are therein designated. 


(1) The population for the rates comprises spinsters 
married at such a date that they are of the duration of 
marriage stated during the calendar year attached to 
the cohort. If maternities were tabulated by date of 
marriage, and not, as exclusively at present in England 
and Wales, by 12-monthly durations since marriage, 
these populations would be unequivocally determined 
for each duration. In the absence of census data 
recording age and duration of marriage for all women 
the appropriate population can be derived only from 
the record of previous marriages. Several empirical 
solutions have been proposed by the Registrar-General 


and others. It can however be shown by means of a 
new graphical technique that previous formulae are 
incorrect and a more exact solution has been found. 


(2) Legitimate Maternities are taken from Tables 
“SS”, “QQ”, “OO”, and “II’’, adjustment being made 
for the fact that 1939 and 1940 are tabulated by date 
of registration but not of occurrence. 


(3) Adjustment was made for incomplete data. For 
the worst year, 1940, information is complete for 98 -6 per 
cent. and entirely absent for only 0-28 per cent. 


(4) The quinquennial groups, aged 16-20, 21-25, 
26-31, . . . to 46-50, at maternity during the first year 
of married life were used as basic cohorts. As such they 
were clearly defined annually by duration, since, e.g. those 
aged 21-25 in 1939 in the first year of marriage (duration 1) 
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APPENDIX TABLE A 
LIVE BIRTH RATES PER 1,000 MARRIED WOMEN 
(1939 Cohort) 
Dura- Parity 
Age |Average| tion —— —— | ss - ——— 
in Cohort; of Total 0 1 2 3 4 5 
1939 Age Mar- ———-— ---— — eaten -—— ——_—_— ——__— —_____ - — - == - - 
riage | Yearly Accum. Yearly Accum. Yearly Accum. Yearly |Accum. Yearly Accum. Yearly Accum. Yearly Accum. 
18 1 465 465 458 458 3 3 — — — 
19 2 283 748 202 660 78 81 2 2 — - - - — 
20 3 230 978 88 748 129 210 12 14 0 0 - - _ 
21 4 209 | 1,187 56 804 116 326 34 48 3 3 - — 
22 = 181 1,368 36 840 88 414 47 95 9 12 1 I 
16-20 23 6 179 1,547 | 25 | 865 80 494 53 148 18 30 2 3 0 0 
24 7 144 1,691 19 884 50 544 49 197 19 49 5 8 I I 
25 8 182 1,873 20 904 70 614 53 250 25 74 10 18 2 3 
| 26 9 154 | 2,027 10 914 49 663 49 299 28 102 13 31 4 7 
! 27 10 118 | 2,145 6 920 27 690 36 335 25 127 14 45 6 13 
28 il 115 |; 2,260 — —- - - - 
29 12 105 2,365 _ - ~ - 
23 I 232 Zsa | oer 227 3 3 - — — 
24 2 232 464 | 197 | 424 31 34 2 2 - 
25 3 166 | 630; 97 | $521 61 95 5 7 0 0 
26 4 167 797 | 83 604 67 162 14 21 1 1 
27 5 151 | 948 61 | 665 | 65 227 20 41 3 4 0 0 
21-25 28 6 152 | 1,100 | 41 | 706 | 73 300 29 70 7 11 I I 0 0 
29 7 117 1,217 29 735 47 347 28 98 i) 20 3 4 I I 
30 8 145 } 1,362 | 29 764 67 414 33 131 11 31 4 8 l 2 
31 9 122 | 1,484 19 783 52 466 32 163 12 43 4 12 1 3 
32 10 85 1,569 ) 792 31 497 25 188 13 56 5 17 2 5 
33 il 71 1,640 | — — _— - 
34 12 60 1,700 — — — — _ 
28 1 180 180 | 173 173 3 3 _ /- _ - 
29 2 200 | 380 177 | 350 19 22 | I 
30 3 137 517 | 83 | 433 48 70 4 5 0 0 
31 4 147 | 664 76 509 57 127 10 15 l I 
32 5 138 802 60 569 58 185 14 29 3 4 0 0 
26-30 33 6 136 938 44 613 63 | 248 21 50 5 9 1 1 0 0 
34 7 106 1,044 28 641 50 298 21 71 5 14 1 2 0 0 
35 8 122 1,166 28 669 55 353 26 97 8 22 3 5 l I 
36 9 100 1,266 18 687 42 395 25 122 9 31 3 8 I 2 
37 10 69 1,335 9 696 27 422 19 141 9 40 3 11 I 3 
38 11 45 1,380 - . 7 
39 12 40 1,420 — . _ - . 
33 1 174 174 163 163 3 3 - 
34 2 176 350 150 | 313 | 19 22 1 I : — 
35 3 123 473 71 | 384 43 65 5 6 | I 
36 4 118 591 52 | 436 51 116 10 16 2 3 
37 5 109 700 42 478 47 163 15 31 3 6 I I 
31-35 38 6 99 799 28 506 43 206 19 50 6 12 1 2 
39 7 73 872 18 524 30 236 15 65 6 18 I 3 I l 
40 8 66 938 15 539 28 264 14 79 6 24 2 5 l 2 
41 9 48 986 10 549 19 283 12 91 5 29 2 7 2 
42 10 28 1,014 4 553 9 292 7 98 4 33 2 9 l 3 
43 11 14 1,028 . : 
44 12 - 1,028 — 
38 | 124 124 113 113 3 3 
39 2 100 224 82 195 12 15 1 I 
40 3 70 294 39 234 25 40 3 4 I I 
41 4 59 353 24 258 27 67 6 10 1 2 
36-40 42 5 38 391 13 271 16 83 7 7 1 3 I I 
43 6 23 414 6 277 x 91 6 23 z 5 0 l 0 0 
44 7 17 431 5 282 6 97 3 26 2 7 I 2 0 0 
45 8 10 441 3 285 3 100 2 28 l s 0 2 0 0 
46 9 6 447 1 286 l 101 1 29 0 s 0 2 0 0 
47 10 3 450 0 286 I 102 I 30 1 9 2 0 
43 1 43 43 38 38 1 1 
44 2 23 66 17 55 3 4 1 1 
45 3 12 78 9 64 2 6 0 1 
46 4 5 83 2 66 1 7 0 1 
41-45 47 5 2 85 0 66 l 8 1 2 
48 6 2 87 0 66 1 9 0 2 
49 7 l 88 0 66 0 9 0 2 
50 8 0 88 66 9 . 2 
51 9 0 88 66 — 9 2 
52 10 0 88 66 - 9 2 - = 
Separation into parities cannot be accurately made after 10 years’ duration of marriage. 
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became 22-26 in 1940 in the second year of marriage 
(duration 2). They are, as we have seen, less easily 
described in terms of marriage. Those aged 21 and 
less than 22 in 1939 in the first year’s duration of marriage 
were married at the age of 20 or 21 in either 1938 or 1939. 
For this reason, the estimation of the correct population 
for each cohort made necessary the estimation of num- 
bers married by age for each quarter of the year. To 
aid typography the Registrar-General’s term n < n — | 
is herein designated duration mn — 1, so that the first 
year of married life (Registrar-General’s 0 < 1) is 
referred to as Year |. Each successive year the cohorts 
were advanced one year in age and one year in duration 
of marriage. Thus one cohort in subsequent years was 
aged 16-20, 17-21, 18-22, 19-23, 20-24, etc. 

(5) Owing to the condensed form of the Registrar- 
General’s Tables, it is not at present possible to compute 
accurate values for durations after the first 10 years. 


Whilst, therefore, higher values for the two earlier 
cohorts are shown in Fig. 4, the values estimated are 
not used in the analysis. 

(6) The maternity rates were finally reduced to live 
birth rates. The results for one cohort, 1939, are given 
in Appendix Table A. 

(7) The labour involved in these computations is 
directly related to the condensed form of the Registrar- 
General’s Tables. If durations of over 10 years of 
marriage were tabulated by single years, and separate 
parities given by single years of age, the value of the Tables 
would be immensely enhanced. It is a measure of the 


English mores that the Registrar-General’s Tables of 


Fertility occupy 36 pages, whereas the Tables of Mortality 
occupy about 300. Presumably the community is more 
interested to learn that fewer females are dying of Inter- 
national List Classification 162 (“Old Age”’) than to know 
whether the community is or is not replacing itself. 
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STATURE OF SCOTSMEN AGED 18 TO 40 YEARS IN 194] 


BY 


E. M. B. CLEMENTS and KATHLEEN G. PICKETT 


Department of Anatomy, University of Birmingham 


Between 1939 and 1946, over 7 million men were 
examined by the medical boards of the Ministry of 
Labour and National Service out of a possible 
population of about 10 million. The object of the 
examination was to separate the fit from the unfit 
for military service. The records give information 
about both these classes, and provide a mass of 
data—which could hardly have been collected by 
normal methods of research—about the relationship 
between certain body measurements, occupation, 
age, place of birth, and medical grade. 

These records have been made available to us, 
and are being used to estimate the distribution of 
stature, weight, and chest circumference in the 
population, and to define such variations in these 
measurements as can be related to age, domicile, 
and occupation. Knowledge about the way stature 
and weight vary in the population is useful in dealing 
with a number of practical problems, such as the 
assessment of nutritional status, and the standardiza- 
tion of the dimensions of equipment and clothing. 
A preliminary study has been made of the data for 
Scotland, and the findings for stature are reported 
here. 


MATERIAL 

Measurements of stature, weight, and chest circum- 
ference were recorded with certain social data in the 
course of the National Service Medical examination 
and these records form the basis of the analysis. Several 
considerations indicate that they are sufficiently reliable 
for the purpose of the present study. For instance, the 
pattern of differences between the means in various 
sub-classifications of the records of each Medical Board 
is similar in our sample. 

Because of the war-time policy of call-up by age 
group, and of the reservation in civilian service of 
certain Occupations, some age groups are better repre- 
sented than others in the records, and certain occupations 
appear less frequently than would be expected from their 
proportions in the census. There is no reason to suppose, 
however, that those belonging to any occupation or 
age group who were called for examination were in any 
way selected on physical grounds. It may, therefore, 
be taken that the data abstracted from the records 
refer to a random sample within occupations and age 
groups, but that the numbers in the groups are not in 
proportion to their occurrence in civilian life. 


The survey has been based on records taken in 1941, 
a period when the call-up was very wide and likely to 
cover the greatest number of occupations and age groups, 
and it appears that the records of this year cover 
approximately half the age range of the working 
population. 

Blind and known mentally-defective persons were 
exempt from registration under the National Service 
Acts. Obvious cripples and men who were unable to 
attend for examination without escort were not examined, 
and a few subjects who were obviously unfit were not 
completely examined and have incomplete records. Such 
persons, who cannot be regarded as full members of the 
working population, have been excluded from _ this 
analysis. 


TREATMENT OF DATA 


The results of previous surveys of the stature 
of adult males in England and Scotland suggest 
that about 200 records constitute the smallest 
sample from each board which will permit of the 
necessary sub-classifications by age, social class, and 
place of origin. It appeared from the records that 
some medical boards dealt with the candidates 
systematically in different groups: for example, 
all the candidates at one session might occasionally 
be either volunteers or conscripts; or follow the 
same occupation: or have a similar medical history. 

The records are bound in volumes. Those made 
available to us covered the period January 1, 1941, 
to March 31, 1941, with the exception of a few 
boards which dealt with sparsely populated areas. 
The records were sampled in the following way: 


No. of "Sampling No. of 
Volumes in Fraction Volumes 
Period Selected 
20-25 1/4 5-6 
15-20 1/3 5-6 
10-15 1/2 5-7 
7-10 2/3 5-6 
5-7 1/1 5-7 


Of the records of the boards which dealt with 
sparsely populated areas, five consecutive volumes 
were taken, starting with January 1, 1941, regardless 
of the period covered. 

The records contained in the volumes were 
sampled systematically, by taking every third entry 
of a first examination of men born in England, 
Wales, or Scotland. This procedure gave a sample of 
approximately 200 records for each medical board. 
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To ensure that each subject contributed only one 
entry in the present analysis only first examinations 
were abstracted. 


INFORMATION ABSTRACTED 


Town in which Board was Held.—This represents 
the area of the subjects’ residence at the time of 
examination. These towns have been grouped into 
regions according to the Registrar-General’s geo- 
graphical classification. 


Age in Years.—This was calculated from the date 
of birth to the date of examination. 


Place of Birth.— Coded into county and administra- 
tive area, from the Registrar-General’s Index of 
Place Names. The Index of Scottish Place Names 
appears as an appendix to Volume 2 of the Report 
on the 1931 Census of Scotland. 


Stated Occupation.—For the sake of uniformity, 
because there is no satisfactory alternative, and 
because it is the basis of the census, occupations 
have been coded according to the 1950 classification 
of the General Register Office. Related occupations 
are grouped by the Registrar-General into ““Occupa- 
tion Orders”. Occupations are also grouped into five 
categories called by the Registrar-General ‘Social 
Classes”. The same occupation order may relate 
to two or more social classes. We have retained 
these terms, and (with the single exception of the 
transfer of students to Class 2) have employed the 
Registrar-General’s classification throughout the 
present analysis. 

The records of the separate medical boards for 
Class | are too few for independent analysis, and 
they have been combined with Class 2 to form a 
composite class referred to as “Class 1/2”’ in the 
main part of the analysis. These two classes can 
be examined separately after the samples from the 
medical boards have been combined into a “‘national” 
sample (see below). 


Anthropometric Data.—Measurements of body 
weight, stature, and chest circumference have been 
copied as recorded, converting from fractions 
to the first place of decimals. To avoid the introduc- 
tion of any systematic error, one-quarter was taken 
down to 0-2, and three-quarters up to 0°8. 
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The medical boards measured height in inches 
and fractions of an inch, with the subjects standing 
erect against a height-measuring standard, without 
shoes. Apparently there were no directions regu- 
lating the precise posture of the head when the 
men were measured. 

A frequency distribution of stature showed that 
most measurements were made to the nearest 
inch and half inch, with some at the intermediate 
quarters. 

Previous studies by Davenport, Steggerda, and 
Drager (1934) have shown that “observer error” 
is a relatively insignificant factor in the measurement 
of stature. Any differences in the technique used 
by various boards when making the measurements 
will increase the variance between the boards and 
will be included in the geographical factor. 

Medical Grade.—\n certain circumstances the 
grading was deferred, either for a_ specialist's 
examination or because of a proposed further 
examination by the board at a later date. These 
have been assigned to a group referred to below as 
“unclassified’’. Medical grades | and 2 have been 
classed as “Fit”, and 3 and 4 as “Unfit”. 

The information abstracted was coded and 
punched on to Hollerith cards from which the 
statistical analysis has been made. The necessary 
sums of squares and cross products have been 
obtained by the technique of progressive digiting 
(Eckert, 1941). 

Most of the tests of significance have been made 
by an analysis of variance. Because its effect on 
adult stature is so small, age has not been stan- 
dardized in the tests of geographical variation. For 
the more detailed analyses of the occupational 
groups, age has been standardized by making the 
tests of significance in the form of a covariance 
analysis with age. 

Differences between means in which P =< 0-01 
have been regarded as statistically significant, 
except when examining the geographical factor, 
where the 5 per cent. level (P = 0-05) has been used. 


ANALYSIS OF DATA 
The data were obtained from seventeen medical 
boards, grouped for the analysis into the regions 
used by the Registrar-General for Scotland. 


TABLE J 
AGE DISTRIBUTION OF THE SAMPLE 





| T 
Age ps 17 18 | 19 | 20-24 





Number .. 4 124 944 736 
Percentage 0-11 3-36 25°37 19-93 
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TABLE II 
MEAN STATURE FOR REGIONS BY AGE GROUPS 














Region ——— --— a —-——— —__—— —-- - All Ages 
17-19 20-29 30-39 40-42 
Southern | 67-14 + 0-24 (130) 67-29 0-21 (140) | 66-45 + 0-24 (138) ~~) | 66-96 + 0-13 (409) _ 
West Central, 66:73 + 0-14 (366) | 66-87 - 0-14 (384) | 66°30 + 0-15 (339) | 66-59 + 0-56 (14) | 66-65 . 0-08 (1,103) _ 
East Central 66-91 - 0-15 (258) | 67-05 + 0-14 (320) | 66°40 + 0-14 (317) om 66-78 - 0-08 (895) 
Northern | 67- 17 0-14 (318) | 66-96 + 0-14 (337) | 66:78 + 0-11 (601) | 67-47 4 0-52 (29) | 66-94 + 0-07 (1,285) 
Total (ssi 96, 0-08 (1,072) | “66-99 + 0-08 (1,181) | 66-55 + 0-07 (1,395) | 67-22 + 0-39 (44) | 66-82 + 0-04 (3,692) 





The means are given in inches, and the numbers in each sample in brackets 


The sampling of the volumes gave a total of 
3,812 records, of which 120 (3-15 per cent. of the 
total and mainly all with gross disability) were 
rejected because of incomplete information. This 
left a sample of 3,692 for analysis. 

Table I shows the distribution of the 3,692 
complete records by age group. Nearly 60 per cent. 
fall between the ages of 19 and 29. 

The distribution of the whole sample by “Social 
Class” is shown in Table V. The relative proportions 
of the classes were found to differ considerably 
from one area to another. 


Stature.—This was examined first because it is 
of great practical importance, and because many 
sub-dimensions of body length (e.g. leg length) 
which affect the dimensions of equipment (e.g. chairs) 
are closely correlated with it. As was anticipated 
it proved to have a normal distribution, thus 
enabling statistical tests of significance to be applied 
with confidence. 

The mean statures for the regions of Scotland 
are set out in Table II. The mean for the whole 
sample of 3,692 records is 66°82” +- 0-04, which 
is significantly less than the mean of 67-0" + 0-03 
for the 9,623 (native) Scotsmen aged 20 measured 
in 1939 (Martin, 1949). That this is probably due 
to differences in the age range of the samples, is 
indicated by the fact that the mean of our 19-year-old 
sample (66:99" +-0-09) is practically identical 
with the 1939 mean. 

The data presented in Table II are made up of 
records from all social classes, and as already 
observed the proportions vary from area to area. 
The data are broadly grouped on a regional basis, 
but the areas are large and may contain limited 
sub-areas showing significant variations in stature. 
These may be grouped and termed “geographical 
factors”. These data are therefore regarded as 
heterogeneous, and the analysis has been arranged 
so that these factors may be examined separately, 
and the detailed analysis made on homogeneous 
samples. 


The mean statures for each medical board and 
each different region have been compared only 
within the same social class. The mean stature 
for each medical board within a region was obtained 
and the significance of the differences between 
means was tested by an analysis of variance. This 
method provided information that was sufficiently 
accurate for the present purpose. The slight gain in 
accuracy that could be obtained from more rigorous 
and complex statistical techniques would not 
justify the additional labour involved. 


The distribution of stature was regarded as 
homogeneous, and the area of agreement termed a 
“homogeneous region” if there was no significant 
difference between the mean statures of the samples 
of the same social class from the various medical 
boards. When a significant difference was found, 
the sample for the medical board showing the 
greatest discrepancy when compared with the mean 
for the whole region was removed from the com- 
bination of boards for separate consideration. The 
regional total and variance were adjusted accordingly 
and the figures re-tested for homogeneity. If they 
proved to be homogeneous, this combination of 
boards was classified as an “‘adjusted region”. If 
not, the process was repeated until one was defined. 
Thus 68 samples relating to four social classes 
from seventeen medical boards in the four regions 
have been examined. The homogeneous means for 
the social classes are presented in Table III (overleaf). 


The means for the homogeneous and adjusted 
regions proved to be homogeneous for each social 
class, and the regional data have been combined 
into what is termed a “national sample” for each 
social class. These are the totals of the classes 
given in Table III. The national sample for each 
social class applies to the whole of Scotland (less the 
heterogeneous areas segregated as “separate” 
areas), and defines an area over which the stature 
of the class is statistically homogeneous. In this 
way, geographical variation has been eliminated 
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TABLE III 
MEAN STATURE RELATED TO SOCIAL CLASS AND GEOGRAPHICAL REGION 



































Region Class 1 Class 2 Class 3 Class 4 Class 5 

EE —E a —— “3 
Southern 68:42* + 0-46 (28) | ins 66-98 + 0-31 (85) - 
West Central 67°55 + 0-25 (105) | 66-97 + 0-12 (542) | 66-17 | 0-19 (188) | 66-32 4 0-18 (174) 
East Central | 68:03 + 0-30 (77) | 66°86 | 0-11 (455) | 66-27 + 0-22 (136) | 66-51 + 0-17 (227) 
Northern | 67:83 -- 0-20 (129) | 66°75 . 0-12(464) | 66-52 0-20(163) 66-43 + 0-15 (319) 
Totalt (a) | 68-93 + 0-37 (38) 67-70 + 0-14(301) | 66:87 + 0-07(1,461) | 66-42 0-11 (572) | 66°44 + 0-10 (720) | 
Means re- i eS aa a 
corded in 
1883t (b) | 69-14 4+ 0-22 67-95 4 0-18 66-61 4 0-13 - 
(a) — (b) 0-79 + 0-43 0-25 4 0-23 0-26 4 0-15 | " ~— 








* In the analyses Social Classes | and 2 are combined as Class 1/2, mean 67-84 + 0-13 (339). 

t+ The mean of the combined Social Classes (“‘pooled national sample’’) = 66-79 -+- 0-05 (3092). 

t Data for 1883 are from British Association Report (1884). 

The standard errors of the means have been derived from the probable errors given by Goring (1913). 


and large pooled samples built up for the detailed 
analysis of occupational differences. 

The national samples of all the social classes have 
been combined into the “pooled national sample” 
which represents the weighted mean stature of 
Scotland (excluding the heterogeneous areas) taking 
account of the relative number in each class. This 
mean is 66:79” +- 0-05 (3,092). Ten heterogeneous 
boards were found, and of these, three were in 
Social Class 3, two in Class 4, and five in Class 5. 
Together they make up 600 records and represent 
the difference between the total numbers in Tables II 
and III. These separate areas have been examined 
in the section on geographical factors, and apart 
from the main sample. 


Geographical Factors.—It is apparent from Table II 
that there is a tendency for the mean stature of 
the West Central Region to be the shortest in each 
age-group, although the actual differences are 
small in each case. The more detailed examination 
confirms that the geographical variation is slight 
within the social classes. Significant variation is 
limited to two small areas which show relatively 
large differences in mean stature when compared 
with the rest of the region. 

The means of the samples from the north of the 
mainland for Social Classes 3 and 4, and from the 
Shetland Islands for Social Classes 4 and 5, are 
about one inch above the means of the national 
samples for these classes. These differences are 
significant. The means of the samples from Paisley 
and Glasgow for Social Class 5 are about one inch 
below the mean of the national sample for this class. 
This difference is also significant. 

The means of the samples for Social Classes 3 
and 5 of the two boards of the Southern Region 
differ significantly, but little importance is attached 
to this difference, which may be due to a high 


proportion of newcomers into the region. The 
samples are not included in the national samples. 

These records have been analysed, although most 
of the samples are too small to allow of detailed 
examination. 

There is no significant difference between the 
stature means for the different social classes in 
Lerwick (F = 2:5, d.f. 3). For Wick, there is a 
significant difference (F = 4°42, d.f. 3), almost 
certainly due to the low mean stature of Social 
Class 5. When the samples for Social Classes 3, 4, 
and 5 for Lerwick and Wick are pooled, the mean 
of the combined sample is 67-48” -+- 0-04, and 
is not significantly different from the mean of 
Class 1/2 of the Northern Region (‘* = 1-42). 
The mean statures of Classes 3, 4, and 5 are not 
only similar to those generally found in Class 1/2 
but the distribution of stature is homogeneous 
in the area. 

With the exception of the Southern Region, 
there is no significant difference between the mean 
stature of the samples of the “‘separate’’ areas 
for the same region and social class. These samples, 
with the exception of those from the Southern 
Region, have therefore been pooled, and classified 
by occupation into the occupation orders. No 
significant difference between the means was found 
when the means and variance of stature were 
computed for these sub-samples. It is apparent 
that the significantly different means statures of these 
areas cannot be accounted for by any one 
occupational group. 

The mean stature for Social Class 1/2 is the same 
in all regions. 

Martin’s analysis (1949) of the Scottish data for 
1939 was based on a regional classification different 
from that of the Registrar-General of Scotland, 
and revealed a difference of 1-1” between the means 
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of the tallest and shortest region. It also showed 
that the Highland area (approximately our Northern 
Region) had the highest mean stature, and the 
Glasgow and Paisley areas the lowest. Tocher (1924) 
gave the stature for seven regions of Scotland 
derived from height measurements of Scottish 
soldiers in 1916. These may have been subject 
to some selection, as some minimum standard of 
stature may have been required on recruitment. 
Nevertheless his figures also suggest that the tallest 
samples came from the North and the shortest 
from the lower Clyde valley. 

A Report presented by the Anthropometric 
Committee of the British Association in 1883 
(published in 1884) gave the average stature for 
various regions of Scotland, with the number 
in each sample. The mean of 68-71” for the whole 
sample of 1,369 records is remarkably high, when 
compared with the 1941 figure of 66-82”. It is 
even significantly greater (by 0-9") than the 1941 
figure for Class 1/2. Table V shows that these earlier 
data paid little attention to occupational classifica- 
tion, and that a high proportion of the subjects 
measured were professional men. A re-assessment 
of these data indicates that only the two tallest 
samples (from the areas of (i) Kirkcudbright, 
Ayr, and Wigtown, and (ii) Edinburgh, Linlithgow, 
Haddington, and Berwick) differ significantly from 
samples from the other geographical samples 
studied, and that all the variation found between 
the other samples may be ascribed to random errors 
of sampling. 

The likelihood is that we are dealing here with 
genetic factors, possibly due to migration and 
selection. 


Stature related to Social Class.—The main 
variation in adult stature appears to be associated 
with social class. A test of significance comparing 
the means of the national samples of the classes 
(Table III) shows that the differences are statistically 
significant (F = 26-9, d.f. 3, and 3,087). The tallest 
mean is found in Class 1/2, followed by that of 
Class 3, and then those of Classes 4 and 5 come 
together. The difference between the means of 
Classes 4 and 5 is not significant. There is a 
significant difference of about one inch between the 
constituent classes of Class 1/2 (f = 3-1). The 
difference of one inch between Classes 1/2 and 
Class 3 is also significant (¢ = 6-52). The difference 
between Classes 3 and 4, and between Classes 3 and 5 
(each $”) are significant (t= 3-5 and 3-6 respectively). 

Differences between the mean statures of the 
different social classes which are also apparent in 
the data of 1883 are indicated in Table III. 
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The present analysis shows that stature in Scotland 
varies more between socio-economic groups than 
it does between geographical regions. The implica- 
tion of this fact, which also applies to other countries 
(e.g. Lundborg and Linders, 1926), can only be 
that different occupations attract men of different 
height, or that men in different sections of the 
community are attracted to different kinds of 
occupation. The second, which is the commonsense 
and more likely alternative, is supported by the 
observation that the highest mean stature is found 
in Social Class I, from which men in the professions, 
as opposed to other occupations, are drawn. 


Stature related to Age.—The British Association 
Anthropometric Committee of 1883 wrote: 

. it is probable that little actual growth takes place 
after the age of 21 years, and that it entirely ceases by 
the 25th year. Full stature is attained earlier in the well- 
fed and most favoured class than in the ill-fed and least 
favoured classes of the community. 

This observation has since been amply confirmed; 
Morant (1950) showed that the mean stature of 
present-day regular recruits into the Royal Air 
Force does not increase after about 20 years, 
whereas the mean stature increased until 24 years 
of age in recruits into the Army in 1913, and until 
26 years in recruits in 1880. Kemsley (1950) found 
that in 1943 the mean stature of a composite sample 
of English and Scottish industrial workers did not 
increase after 22 years of age. 

The data dealt with here indicate that Scotsmen 
of all social classes reach adult stature by their 
19th year. The data analysed by the British Associa- 
tion Anthropometric Committee in 1883 showed that 
individuals corresponding to Social Class 1/2 then 
reached mature stature at about 18 to 19 years, 
and those corresponding to our Social Classes 3, 4, 
and 5, reached it at about 22 years. Thus, while 
there has been no change in the mean stature at 
maturity over the past 70years, the average age at which 
full stature has been reached by men who fall into 
the non-professional classes has become increasingly 
less. Furthermore, the growth rates of the different 
classes no longer differ after 18 years of age. 

Table II shows a tendency for the mean stature 
to be less in the over-30 age group than in the 
over-20 age group. The over-40 age group does not 
apparently follow this trend, but here the samples 
are very small. The trend has been examined in 
detail, and such slight changes in adult stature as 
occur after the age of 18 years may adequately be 
described by a linear regression.* The regression 
*Linear regression equations fit the data well, and curved regressions of 


the form y = a + bx + cx? (where ais aconstant, b and c are coefficients, 
x is age in years, and y is stature) do not improve the fit significantly. 
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TABLE IV 
REGRESSION COEFFICIENTS OF STATURE ON AGE (IN INCHES PER YEAR) FOR AGE RANGE 18-40 YEARS 
Class | and 2 Class 3 Class 4 Class 5 Pooled 
Age Group —_— a et elites, alte anges S 
n b n b n b n b n b and S.E 
17-19 81 0-917 | 488 | 0-481 | 158 0-281 | 146 0-656 873 0-254 + 0-278 
20-29 100 0-028 545 | 0-054 | 194 0-013 173 0-055 
30-39 153 0-007 426: 0:087 : 215 0-074 385 0-115 
40-42 5 — 2 — | 5 — 16 1-151 
Average 339 0-017 1461 0-061 | 572 0-034 | 720 0-087 | 3092 | — 0-032 + 0-007 
Deviation from tt ae Fo) he , 
average regressions 1949-8797 9530-5026 3959 - 6053 4775-2226 20336-2653 
Deviation from indi- oe % 4 ae 
vidual regressions 1944-0544 9513-2733 3956-5306 | 4761-2221 20307 - 1936 
F wa ZI = a a: 2 
Average regression coefficient of stature and age over age range 18-40 years for all social classes — 0-032 + 0-007. 


coefficient of height on age for our pooled sample 
from 18 to 40 years is — 0-032 + 0-007. The 
negative value of ‘+’ implies that height begins to 
decline once adult stature is attained, an observation 
which corroborates Morant’s finding. 

In Table III is given the mean stature derived 
from all the data for age groups covering the period 
from 18 to 42 years. The mean statures of the first 
two groups are of critical importance, because they 
cover the period in which the change occurs from 
annual increments (growth as usually understood) 
to the adult state. 

Regression coefficients for height on age have 
been computed separately for individuals under 19; 
for the age group 20-29; and for the age group 30-39, 
and corresponding regression for the age group 
40-42 of Social Class 5 was also calculated (Table IV). 
None of the coefficients for each social class differ 
significantly from the coefficient relating stature 
to age for the whole age-range of the class in 
question. 

The regression coefficients of stature on age for 
each of the social classes of the first age group (up 
to and including the 19-year-olds) do not differ 
significantly. 

A mean regression coefficient, computed for all 
the data by pooling all the social class groups, 
does not differ significantly from the overall mean 
regression for each social class. 


Stature in 1883 and in 1941.—Mean statures 
of the different social classes in 1883 are given in 
Table III. The differences are very similar to those 
found in 1941; differences between means of the 
two surveys for the same class are not significant. 

The mean statures for 1883 and 1941 cannot be 
compared area by area, partly because only a few 
samples have the same geographical boundaries 
in both surveys, and partly because average statures 





by birthplace in the 1883 report were not standardized 
for social class. Table V shows that distribution 
by social class is quite different in the two samples, 
but a comparison can be made in the case of the 
Shetland Islands, where no significant differences 
were found between the mean statures of the four 
social classes of 1941 (mean 67-48” + 0-20). 
The mean stature for the Shetlands for all classes 
in 1883 was 67-92” (n = 108). Assuming the same 
variance for the two sets of data, the difference 
between the means is not statistically significant. 


TABLE V 


COMPOSITION BY SOCIAL CLASS OF THE SAMPLES OF 
1941 AND 1883 























Social 1941 Sample 1883* Sample 
Class _____  - — , ——__-—_—— — —_— 
No. Percentage | No. Percentage 

SS 3 2~CO«|:Ct*<“‘<«‘z SS, ~Ss«d 739 28-6 

2 301 8-2 5,472 14-6 

3 1,740 47-1 12,636 33-6 

4 645 17-5 5) 

5 968 6-2 f 8,727 23-2 
Total 3,692 100-0 37,574 100-0 

* From British Association Report (1884). 

Stature related to Occupation —Each ‘“‘social 


class” may relate to a number of “occupational 
orders’. An analysis has been made of the national 
samples of the social classes in which the differences 
between the means of the occupational orders were 
tested for significance by a covariance analysis, 
thereby taking age differences into account. It 
showed that there were no significant differences 
between the mean statures of the occupational 
orders at either the | or 5 per cent. levels of 
probability. 

Cathcart, Hughes, and Chalmers (1935) have 
given the mean statures for certain occupational 
groups in Glasgow, but they are not sub-classified 
by social class. Kemsley (1950) has given the mean 
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TABLE VI 
MEAN STATURE AND DISTRIBUTION OF THE FIT AND THE UNFIT, BY SOCIAL CLASS 
Fitness 
Class Test of Significance Percentage Fit 
Fit Unfit Not Known F 

» 1/2 67-88 (264) 67°82 (66) | 68-22 (9) | Tg 
- 3 66-83 (1,185) 66:73 (240) 67-41 (36) _— Led ane) 81-1 
— 66:45 (445) 66-32 (112) | 66-28 (5) | «<td | +78 
- 5 66:54 (517) | 66-46 (180) | 67:16 (3) «<t | WS 
~ Total 66-81 (2,411) | 66:69 (598) 67:23 (83) | 1-63 BO! 

The distribution of fit men in the social classes was examined: y* = 24-7; d.f.6; (P < 0-01). 


In Tables VI and VII means have been adjusted for 


age differences by standardizing to 26 years. 


The means are given in inches, and the numbers in each sample in brackets. 


statures of samples of Scottish miners (66-0") and 
of Scottish workers (large firms 66-2”, small 65-7"). 
Both these sets of figures refer to a standardized 
age of about 37 years and they should be increased 
slightly, by say 0-1”, to make them comparable 
to the means in this survey. Kemsley’s mean 
statures are almost identical to ours of 66-05” for 
miners and 66-42” for Social Class 5. The mean 
statures found by Cathcart and others (1935), with 
the exception of the clerical order, are less than those 
found in this survey. 

Nine of the larger occupational orders, including 
metal, transport, and commercial workers, were 
examined, taking each social class separately. The 
differences between the mean statures of the samples 
of the occupations within the social classes were 
examined by a covariance analysis, taking age 
differences into account. Within the orders examined 
none of the differences between the stature means 
of the various occupations were significant at the 
1 per cent. level and only one (the professional 
order) was significant at the 5 per cent. level of 
probability. 

Stature related to Fitness——The mean stature 
of the unfit within each social class is not significantly 
different from that of the fit or of those “‘not graded”’, 
nor does any difference emerge when all the classes 
are combined (Table VI). A y? test showed that there 
are significantly more fit men in Social Class 3 
and fewer in Social Class 5 than in the average 
of the whole population (7? = 24-7, d.f. 6). 

The stature of men whose. records had to be 
rejected because of incompleteness (most of whom 
were unfit for service) did not differ from that of 
the able-bodied. 


Stature related to Classification as Volunteers and 
Conscripts.—The mean stature of conscripts within 
each social class is not significantly different from 
that of Volunteers (Table VII), nor did any difference 
emerge when the social classes were pooled. 


TABLE VII 


MEAN STATURE AND DISTRIBUTION OF VOLUNTEERS 
AND CONSCRIPTS 





Covariance Percentage 




















Class Volunteers Conscripts with Age Volunteers 
t 
1/2 | 67°54 (28) | 67-91 (311). 1-17 83 
3 | 66-95(154) | 66-81(1,307) | 0-52. | 10:5 
4 | 66-47 (21) | 66:42 (551) 0-35 | 37 
5 | 67-01 (3) | 66:56 (717) | Nottested | 0-4 | 
Total | 66-98 (206) | 66-78 (2,886) | Nottested* | 67 





* Totals, excluding Class 5, t = 0-74. 
The distribution of volunteers and conscripts in the social classes 
; 01). 


was examined: x? = 90-11; d.f. 3; (P- 

The means are given in inches, and the numbers in each sample 

in brackets. 

A 7* analysis shows that there are significantly 
fewer volunteers in Classes 4 and 5 than would be 
expected on a chance basis, and correspondingly 
more volunteers in Classes 1/2 and 3 (y* = 90-11, 
d.f. 3). The Privy Council Report on Physical 
Deterioration (1904) established that casual employ- 
ment (which corresponds to the present Social 
Class 5) was the occupational background of most 
volunteer recruits at that time. In 1941 most 
volunteers belonged to Social Classes 1, 2, and 3. 
This may be only a characteristic feature of war-time 
recruiting, but it confirms the conclusions of the 
Privy Council Report, that recruiting departments 
do not as a rule deal with a representative sample 
of the population. 


DISCUSSION 


As observed at the beginning of this paper, the 
present study is the first of a series designed to 
show the distribution in Britain of certain anthropo- 
metric measurements that bear on the assessment 
of nutritional status and on the standardization 
of dimensions of equipment and clothing. As 
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compared with peace-time, the conditions existing 
at the time of the survey were to a certain extent 
abnormal, since the civilian population had already 
been depleted of territorials, reserves, and many fit 
volunteers. It is unlikely, however, that these 
factors can have appreciably affected the conclusions 
derived from our analysis, since no significant 
difference in stature between volunteers and con- 
scripts in 1941 was found, and there is no reason 
to suppose that any existed earlier. 

It is, in fact, reasonable to suppose that our 
conclusions apply adequately to the whole of the 
Scottish working population. While our analysis 
dealt predominantly with men under the age of 40 
(only 44 over that age being included), the decline 
in stature which occurs between the period of 
middle-age and retirement is so small that it can be 
ignored for all practical purposes. To estimate 
the distribution of stature of adult men in a definite 
area, it is necessary to take account of the distribu- 
tion of stature and of the number of men in each 
social class in the area. For many purposes it 
will be safe to assume that the proportional strength 
of the social classes within the area is the same as in 
the whole population. Excluding the areas of unusual 
geographical variation in stature, the weighted 
mean stature of the “pooled national’’ sample in 
Scotland, taking account of the relative number in 
each social class, is 66:79" (s.d. 2°61"). Where 
selection is likely because of social or economic 
factors, it will be necessary to know the body 
dimensions found within the occupational orders 
of a particular social class (e.g. skilled workers 
in the engineering industry are placed in Social 
Class 3, and the planning of machinery controls 
which they will use should obviously be related 
to the body dimensions characterizing this class; 
again, in the ready-made clothing industry, some 
materials have a limited and defined market because 
of price and quality. 

The mean statures in the heterogeneous areas, 
excepting those of Paisley and Glasgow, are above 
average. The statistics derived from the national 
sample of Class 1/2 will provide appropriate 
estimates of the distributions for all the classes in 
these areas. 

The population covered by the medical boards 
of Glasgow and Paisley is large and justifies separate 
consideration. For these boards the mean stature 
of the sample of Classes 4 and 5 combined is 65-44” 
(standard deviation 2-72) and for all the social 
classes combined it is 66°49” (standard devia- 
tion 2-80). 


SUMMARY 


(1) The medical records of the Ministry of Labour 


and National Service for the first quarter of 1941 
have been sampled. Variations in stature associated 
with geographical region, social class, age, and 
occupation have been examined. 

(2) The mean stature of the whole sample is 
66-82” + 0-04 (N = 3,692). 

(3) Significant differences exist between mean 
statures in different social classes. Means (in inches) 
in Classes | (professional) and 5 (casual labour) were 
68-93” + 0-37 and 66-44” + 0-10 respectively. 

(4) The mean does not increase after the 19th 
year, and a steady decline, which is similar for 
each social class, begins immediately full stature 
is reached. 

(5) Within the same social class there is little 
variation in stature associated with geographical 
area or occupation, and no significant differences 
have been noted between men classified as medically 
fit and unfit, or between volunteers and conscripts. 
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INCIDENCE OF NEUROSIS RELATED TO MATERNAL AGE 
AND BIRTH ORDER 


ALAN NORTON 


From Bexley and Lewisham Hospitals 


Order of birth and the strongly associated variable 
maternal age have often been investigated as 
possible determinants of a variety of congenital 
abnormalities. The work of Penrose (1934) and 
Malzberg (1950) on mongolism, and that of 
Still (1927) and McKeown, MacMahon, and 
Record (1951) on congenital pyloric stenosis has 
pointed to some of the difficulties. 

Birth order has also frequently been examined 
in studies of intelligence, juvenile delinquency, 
psychosis, and epilepsy, and after allowance has 
been made for the pitfalls in such investigations a 
general impression remains of a_ handicapping 
of the first-born. 

In psychiatric work, the emphasis shifts somewhat 
from the order of birth to the position in the family. 
This has long been looked upon as one of the 
influences moulding character. The schools of 
Freud and Adler in particular have always laid 
stress on this factor, and the tacit assumption has 
been made that it is also of aetiological importance 
in the production of mental and nervous disorder. 
If this is so, the decrease in mean family size from 
Victorian times to the present (from 5-71 to 2-19, 
vide Report of the Royal Commission on Population, 
1949) may be important. Despite a wealth of 
anecdotal material there is a surprising paucity 
of recorded factual evidence. 

The present investigation attempts to answer the 
question whether the incidence of neurosis is 
related to maternal age and order of birth. The 
incidence of neurotic illness in only children, and in 
youngest, intermediate, and eldest children in 
families with more than one child, is also examined, 
as is the association with father’s age, age differences 
between parents, and loss of one or both parents. 

Parts of this field have been surveyed in the last 
thirty years, particularly by Holmes (1921), Jones 
(1933), Thurstone and Jenkins (1931), Hsiao (1931), 
and Miller (1944), but much new material has been 
published since these reports were compiled. 


METHOD 
The investigation of the association between the 
incidence of neurosis and maternal age and birth order 


relies upon a comparison of a series of patients with a 
control group. Since information was obtained about 
size of family as well as about position in family, and 
since the sibships were presumably complete at the time 
of enquiry, the method of Greenwood and Yule (1914) is 
also used in examining the association of incidence with 
birth order. 


(a) Patients—A random sample of 500 patients of 
fifteen years of age or more attending the psychiatric 
department of a general hospital was chosen. Most were 
suffering from psychoneurosis or personality disorders. 
Patients with florid psychoses and organic mental disorder 
were excluded, but the series does contain a few mild 
endogenous depressives, accounting for less than 10 per 
cent. of the total. The information was in most cases 
provided by the patients themselves, but was given in 
some by a relative. There seems no reason to suppose 
that there is any difference in reliability between informa- 
tion provided by patients and by controls. The age of 
the patient recorded is that given on his first attendance. 
The period covered is July, 1947, to July, 1948. Some 
details were also noted from the case-sheets of a further 
2,000 patients who attended the department between 
1943 and 1947. 


(b) Controls.—Information was obtained from a 
series of 500 controls, matched for age, sex, and social 
class, who were selected from the general in-patients of 
the hospital. Patients referred for a psychiatric opinion 
while in hospital were rejected, as were all patients 
admitted for skin and neurological disorders. It took 
much longer to amass the requisite number of controls 
because at the outset the extreme difference in age- 
distribution between the two types of patient was 
unsuspected. The period covered in collecting informa- 
tion about the control series was from December, 1947, 
to March, 1949, 


Data.—The information recorded for both groups 
was as follows: 


Propositus: Age, sex, marital status, birth rank. 
Mother: Age at birth of propositus; if dead, age at 
death and year of death. 

Father: Age at birth of propositus; if dead, age at 


death and year of death. 


Family Size: At the time of the enquiry all families are 
believed to have been complete. 
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TABLE | 


NUMBERS OF PATIENTS AND CONTROLS* AT DIFFERENT BIRTH RANKS AND MATERNAL AGES 
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NEUROSIS, MATERNAL AGE, AND BIRTH ORDER 


Other information, of perhaps greater interest to the 
psychiatrist (for example—family history, social condi- 
tions in the home, emotional relationships, early life, 
etc.), was excluded because of doubt about its reliability. 


MATERNAL AGE AND BIRTH ORDER 


Table I (opposite) distributes of patients and 
controls by maternal age and birth rank. Table II 
shows that the proportion of patients in the upper 
birth ranks is greater than in the case of controls. 
Table III shows that the proportion of patients born 
to mothers at higher maternal ages is greater than 
in the case of the controls. 

Birth rank and maternal age are, of course, highly 
correlated and an attempt was made to separate 
the association between the two variables by 
examination of distributions 

(a) at different birth ranks when maternal age is fixed, 

(b) at different maternal ages when birth rank is fixed. 

On the whole the association with maternal age 
was more definite than the association with birth 
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The data recorded included family size as well as 
the birth rank of the propositus, and since almost 
all the sibships were believed to be complete the 
opportunity was taken to examine the association 
with birth order by the method of Greenwood and 
Yule (1914). As controls are unnecessary in this 
case, all the 2,500 patients for whom this information 
was recorded have been included. The association 
with maternal age cannot be examined in the same 
way because the mother’s age at the birth of siblings 
was not noted. 

Table IV gives the distribution of 2,500 patients 
by birth rank and size of family. Table V (overleaf) 
confirms the result noted in Table II: the observed 
proportion of patients in the higher birth ranks is 
greater than the expected proportion. 

The Greenwood-Yule method has also been 
applied to the 500 controls. This confirms that 
they are evenly distributed between different birth 
ranks: 












































° . y2—):] “n= e ‘S=—- P<-0:-9 
rank, but the results are inconclusive. x S:n=35:0 ss 
TABLE II TABLE III 
DSTRIGUTIONS BY SRT RANK DISTRIBUTIONS BY MATERNAL AGE 
Patients Controls 
Birth Fa Difference Maternal Patients Control 
Rank Number Per cent.| Number Per cent. (a) (b) r = me: . : sen il Difference 
(a) (yrs.) Number Per cent. Number Per cent. (a) (b) 
b 
1 141 159 (a) i 
2 110 66:2 121 74-0 7-8 42:9 Under 23 52 70 
3 80 90 23—27 132 62:2 138 69-6 7:4 43-0 
832 127 140 
4 49 42 ces Lis. 5c ee Pe: ee ea 
5 51 33:8 28 26-0 7-8+i2:-9 23...37 105 82 
6 and over 69 60 38—42 69 37-8 54 30-4 7:4 43-0 
= 3: 15 16 
Total 500 100 500 100 “Sendover, | ete ey a, rn 
Total 500 100 500 100 
/ 10-05, n= 5, 0:05 < P< 0-10. 
Mean family size is approximately the same for patients (4-9) as 
for controls (4-8). x 8-11 n= 5, 0-1 P<0-2 
TABLE IV 
DISTRIBUTION OF 2,500 PATIENTS ACCORDING TO BIRTH RANK AND SIZE OF FAMILY 
Size of Family i 
Birth |——————__———_- —<———_$§$=$ —_— —_—_— — — —__— —_— —— Total 
Rank 1 2 3 4 5 6 7 8 9 10 | 11 2 | 4a! | | | 87 | lo 
 - | 226 «172 «41224”~sé«‘“(‘O!tC*StS«Cd %o@)/uatlimtlato!}s|oli2\)/2)/1/o};11]01]0(1 7s: 
2 186 128 90-5 59 32 26 il 10 4 1 2 4 0 0 0 0 0 0 §53°5 
3 122 | 77-5| 72:5) 39 | 16 | 18 | 1 4 4 2 1 1 1 0 0 0 0 369 
4 91 73-5 | 37 | 21 | 20 7 8 4 1 1 0 0 1 0 2 0 266-5 
5 89 40 | 24 | 17 | 19 | 13 3 0 2 0 0 0 0 0 0 207 
6 41 | 32} 19] 12] 9 0 5 0 1 1 1 0 0 0 121 
7 35 | 22 16 10 2 1 I 0 0 0 0 0 0 87 
8 21 14 | 6 3 3 2 1 1 0 0 1 0 52 
9 18 | -7 3 3 3 0 0 0 0 0 0 34 
10 17 3 3 1 0 0 0 0 0 0 24 
11 5 8 0 2 0 0 0 0 0 15 
12 2 2 1 0 0 0 0 0 5 
13 6 0 0 0 0 0 0 6 
14 1 1 0 0 0 0 2 
15 4 0 0 0 0 4 
16 0 1 0 0 1 
17 1 0 1 2 
Total | 226 358 | 374 348 | 335 225 178 1139 118 84 33 30/25 ;9 9 | 2) 3/3. 1 | 2,500 





The decimal places are due to twins, which have been allocated as 0-5 to each of two successive birth ranks. 
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TABLE V 


EXPECTED AND OBSERVED DISTRIBUTIONS BY 
BIRTH RANK OF 2,500 PATIENTS 


(Greenwood—Yule Method) 
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TABLE VII 
AGE OF PROPOSITI AT DEATH OF PARENTS 





Father (Per cent.) Mother (Per cent.) 














Age at. | ———_, ——_,, 
Distribution Parents’ Patients Con- Difference Patients Con- Difference 
Birth —__—_——_—— — Death trols trols 
Rank Expected Observed (yrs) (a) (b) (a) — (b) (a) (b) (a) — (b) 
1 793-8 751 - 0-4 3-8 | 1-4 | 2-441-0] 2-0 | 2-2 | —0-240-9 
2 572-8 553-5 5-9 4°8 2-4 2-441-2) 1-6 1-8 0:2 .0°8 
3 388-8 369 10-14 3-8 3-8 0-0 -1-2) 2-6 3-2 0-6. 1-0 
4 264-2 266-5 15-19 3-2 5-4 2-2+1-3) 3-8 2:8 1-01-11 
5 177-2 207 20 and 
6 and over 303-5 353 over | 26:4 27-2 6-8+2-8! 18:2 | 16-2 2-01-24 
Total 2,500°3 if 2,500 Total | 42-0 | 40-2 | 1-843-1! 28-2 | 26-2 2-0:2-8 
. Parent : 
2? = 17-07, n= $, P=—0-01 alive 58-0 59-8 1-8+3-1] 71-8 | 73-8 2:0: 2:8 
A subject of additional interest is the possible Total 100 = 100 100 100 
(500) (500) (500) (500) 


association between the incidence of neurosis and 
position in family (as distinct from birth order). 
Table VI shows no difference to exist between 
patients and controls either in the proportion of 
only children, or in the proportions of eldest, 
intermediate, and youngest children in families 
with more than one child. 


TABLE VI 


DISTRIBUTIONS OF PATIENTS AND CONTROLS 
ACCORDING TO POSITION IN FAMILY 

















; Per cent. 
Position of — = Difference 
Child Patients Controls (a) (b) 
in Family (a) (b) 
Only 9-0 8-4 0-6 1-8 
Eldest 19-2 23-4 2 2-6 
Youngest 25-0 24-7 0-3 2:7 
Intermediate 46-8 43-5 >°3 3-1 
Total 100 100 = 
(500) (500) 
x* 2:8, n a 0-3 P 0-5 


As might be expected in view of the correlation 
between mother’s and father’s age, the proportion 
of patients with older fathers is slightly higher 
than for the controls. 

The difference in age between the parents has 
also been found to be somewhat greater for the 
patients than for the controls, a conclusion similar 
to that of Glueck and Glueck (1934), Thurstone 
and Jenkins (1931), and Doshay (1943) in investi- 
gating groups of juvenile delinquents. 


Loss OF A PARENT 

Table VII shows that: 

(1) There is no difference between the proportion of 
patients who had lost a parent and the proportion of 
controls. 

(2) A significantly higher proportion of patients had 
lost a father before they were ten years old. 

(3) There is no significant difference between the age 
of patients at loss of mother and that of controls. 





Fourteen patients and eleven controls had lost 
both parents before reaching the age of twenty. 
No difference was found between patients and 
controls in the age at which this loss had occurred. 


These conclusions are surprising. The field 
covered by the literature is considerably wider and 
deals chiefly with the relationship between the 
“broken home” situation and such groups as 
juvenile delinquents and psychopathic characters. 


Bowlby (1951), for example, who has recently 
marshalled all the evidence, concludes firmly that 
there is a close relationship between early maternal 
deprivation and the development of affectionless 
characters. Barry (1939), too, in a_ psychotic 
group, found that the incidence of maternal deaths 
but not of paternal deaths, was higher than in the 
general population. Of a group of neurotic soldiers, 
Madow and Hardy (1947) found that 21 per cent. 
had lost one or both parents before the age of 
sixteen, loss of father being just as frequent as 
loss of mother. The lack of controls in these studies 
is a serious drawback, but the Statistical Bulletin 
of the Metropolitan Life Insurance Company (1944) 
states that 16-7 per cent. of the population of the 
United States have lost one or both parents before 
the age of eighteen. In the present series, 14-2 per 
cent. of the controls had lost one or both parents 
before the age of fifteen, a figure in accordance 
with that quoted for the American population. 

The conclusions to be drawn from Table VII 
lend no support to the view that early loss of the 
mother is a cause of psychiatric abnormality in 
neurotic patients. The high figure for early loss 
of the father may perhaps conceal a higher rate of 
illegitimacy in patients than in controls, but it 
suggests that it is economic insecurity rather than 
emotional deprivation which is harmful. 
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NEUROSIS, MATERNAL AGE, AND BIRTH ORDER 


DISCUSSION 


The examination conducted here of a group of 
neurotic patients and of a group of controls leads 
to the conclusion that there is an association 
between neurosis and the higher birth ranks (four 
and upwards) and the higher maternal ages (33 and 
upwards). As regards maternal age there seem to be 
no comparable published figures, though Marro 
(1912) reported that both very young and very 
old parents tended to have more criminal and insane 
children than parents of between 21 and 36. 

As regards the relation between birth rank and 
neurosis, Abernethy (1940) reported that first-born 
children were less neurotic than those born later. 
This conclusion is contrary to that of Pearson (1914) 
who was of the opinion that the later members of 
the family after the first and second were sounder 
both mentally and physically. 

Most authors who have written on this topic have, 
however, been less concerned with birth rank than 
with “position in family”. Its importance in 
moulding character has been emphasized by many, 
particularly those of the Adlerian school. In this 
literature each position is linked with particular 
traits. For example, Adler (1928) remarks that 
the only child searches for support at all times; 
Wexberg (1930) says that the eldest may “‘carry the 
gesture of a dethroned monarch for the rest of its 
life’; Hill (1945) says that the youngest shows a 
striving for power, or else evades duties and respon- 
sibilities and retreats. 


Most of these authors are not content to confine 
their generalizations solely to character traits, 
and assert that position in family also predisposes 
to neurosis and behaviour disorder. The negative 
findings of the present enquiry shown in Table VI 
are opposed by such statements as these: 

Wexherg (1930): Only children “‘whether as children 
or as adults constitute a large percentage of a psychiatrist’s 
patients’’. 

Hill (1945) of eldest children: ““A large proportion 
develop into problem children,” and ‘“‘a large number of 
youngest children fill the ranks of problem children, 
neurotics, and criminals.” 

Brill (1922): “It would be best for the individual 
as well as the race that there should be no only children”’, 
and “only childism is a bad disease”’. 


These views are based on long clinical experience 
and not upon controlled enquiry. 


Several authors, however, report negative findings: 

Fenton (1928) could find no reliable trait differences 
between various birth positions, nor significant correla- 
tion between birth order and emotional abnormality in 
groups of American college students. 
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Stuart (1926) in a similar investigation found no 
evidence at all in favour of the hypothesis of emotional 
instability of the only child. 

Thurstone and Thurstone (1930) remarked that “birth 
order is not so universally important a consideration 
in mental hygiene as is sometimes believed. 

The evidence presented in this paper implies 
that the decrease in the average size of the family 
in the past hundred years, which has reduced the 
numbers born in the higher birth ranks, may well 
have diminished in consequence the number of 
people prone to develop neurotic illness. Despite 
much medical and lay opinion to the contrary, 
there seems no reason to believe that the concomitant 
increase in the proportion of those born into a 
special position in the family—only, eldest, or 
youngest—has been harmful. 


SUMMARY 

(1) A comparison of 500 psychiatric patients 
with 500 controls matched for age, sex, and social 
class shows that a larger proportion of patients 
than of controls were in the higher birth ranks. 

(2) The proportion of patients born to mothers 
at higher maternal ages was greater than in the 
case of the controls. 

(3) An examination of the patients’ sibships, 
together with the sibships of a further 2,000 patients, 
was made by the Greenwood-Yule method. It 
confirms that the observed proportion of patients 
in the higher birth ranks was greater than the 
expected proportion. 

(4) No difference was found between 500 patients 
and 500 controls in the proportion who were only, 
eldest, youngest, or intermediate children. 

(5) Equal proportions of patients and controls 
had lost a parent; but a significantly higher propor- 
tion of patients had lost a father before the age 
of ten. No such difference was found in the age 
at loss of mother. 


My thanks are due to Dr. Henry Wilson for his 
interest and encouragement; to Dr. P. H. Tooley and 
the other members of the staff of the Psychiatric Depart- 
ment of the London Hospital, both medical and lay, 
for help in collecting the data; to the staff of the 
Almoner’s Department of the London Hospital, in 
particular to Miss Mary Wise, for collecting the control 
data. 
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FACTORS INFLUENCING SEX 


FROM RESPIRATORY TUBERCULOSIS 


DIFFERENCES IN MORTALITY 
IN 


ENGLAND AND WALES 


BY 


J. C. MCDONALD 
Public Health Laboratory Service, Medical Research Council 


In Great Britain, men and women die from respira- 
tory tuberculosis at very different rates. After due 
allowance is made for variations in age structure, 
the death rate for men is much higher than for women. 
Even more striking is the difference between the curves 
exhibiting age-specific death rates in the two sexes. 
The male curve is now characterized by a slow rise 
from early adult life to a peak at about 60 years and 
then a decline to old age. The female curve, in contrast, 
rises steeply to a high peak between 20 and 30 years and 
then falls away almost as rapidly. 

The aim of this paper is to examine some of the 
many factors which may be responsible for these 
different mortality patterns. 

Investigation has been confined to the examination 
of existing records, all of which have been taken from 
various volumes of the Registrar-General’s Statistical 
Review of England and Wales, and from his Decennial 
Supplement for 1931. From these tables, numbers of 
deaths from respiratory tuberculosis by age, sex, 
social class, and occupation have been extracted for 
different years. The populations of each group were 
obtained from the same source. Age-specific death 
rates were calculated from these figures in various 
population groups. In a few instances the calculation 
had already been made by the Registrar-General. 

The usefulness of this method of study is limited by 
the fact that mortality records are the only indices of 
respiratory tuberculosis used. Death is the terminal 
event in a disease which may have been going on for 
many years. A description of the circumstances which 
are associated at death may bear little or no relation to 
those which set the morbid process in motion. This 
is particularly true of circumstances leading to the 
primary infection with the tubercle bacillus. Tuberculin 
testing surveys have shown that no significant difference 
exists between the percentage of male and female 
reactors (McDougall, 1949a). This fact suggests that 
mortality differences may be attributed either to some 
inherent difference between the sexes, to unequal 
chances of reinfection, or to other environmental 
inequalities. 


HIsTORICAL TRENDS.—Tuberculosis mortality has 
been falling for at least a century, and consideration of 
Fig. | gives evidence of the very considerable improve- 
ment which has taken place during the past 50 years. 
The standardized mortality per million for the years 
1851-60 was 2,694 for men and 2,854 for women. By 
1939 these rates had dropped to 556 and 404 (Registrar 
General, 1947), a reduction to one-fifth of the male 
rate and to one-seventh of the female rate. 
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Fic .1.—Phthisis mortality (1900) and respiratory tuberculosis 
mortality (1948). 
(Sources: 63rd Annual Report of Registrar-General for 1900, 
Table I. p. xxxi. Registrar-General’s Statistical Review for 
1948, Part I. Table 2, p. 2, and Table 24, p. 180). 
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These historical changes have been studied in detail 
by Hart and Wright (1939) who state: 

The mortality for most age groups in England and 
Wales has declined steadily since 1870, except during 
the war period 1914-18. 

For young adults the mortality declined satis- 
factorily until the beginning of the present century, but 
a retardation set in about 1900-4 for young women 
and about 1912-14 for young men. This retardation 
has been particularly severe amongst young women 
whose mortality in 1931-33 was scarcely lower than it 
had been in 1901-3. 

The retarded decline in mortality from respiratory 
tuberculosis amongst young adults has been most 
serious in the large urban areas of the country, i.e. 
the metropolitan and county boroughs. 

It is against a background of falling death rates at 
all ages and in both sexes that the present patterns of 
mortality must be considered. Even the young female 
peak figures are no exception to this downward trend 
though they exhibit it least. The shape of the male 
curve would seem well explained by the “cohort” 
hypothesis of Frost (1939), and though it is less 
apparent the female curve must be influenced by the 
same mechanism. McDougall (1949b) notes that there 
exists 

a very constant correlation in successive generations 

between infant mortality from tuberculosis and the 

rates for the remainder of life in the same generation. 

Childhood mortality reflects closely the weight of 
infection to which that generation of children is 
subjected. The bulk of this infection must originate 
from infected adults in the child’s environment. 

It appears probable that once a downward trend is 
established, as at present, the mortality from tubercu- 
losis will continue to fall until for some reason a 
given number of infected cases manage to infect a 
higher number of children of the next generation. 
Assuming no change in the nature of the tubercle 
bacillus or in the innate susceptibility of the human 
host, environmental conditions would seem to be the 
factors which determine an upward or downward 
trend. 

The problem which remains is to explain why, in 
spite of the universal downward trend, young women 
fare less favourably than young men, and older 
women much more favourably than older men. 

Springett (1950) has helped to answer this question 
by showing that for purely mathematical reasons the 
peak rate of mortality shifts to an older age group 
only when the curve is rounded or plateau in type and 
not when steeply rising and sharply peaked. Young 
adult females experience an earlier and more sharply 
rising mortality from tuberculosis than do young 
adult males. It has been suggested that this is due to 
fundamental physiological sex differences. If there is 


indeed a difference of this kind in the basic curve for 
the two sexes it would explain, at least in part, why 
older men fare less well than older women. It seems 
reasonable to think that other factors may also play 
a part and the possible influence of certain of these 
will now be examined. 


INFLUENCE OF URBANIZATION.—The mortality pat- 
terns of men and women living in London, large cities, 
small towns, and rural districts are shown in Fig. 2. 
In both 1911 and 1938, rural districts are seen to have 
the lowest rates. In 1911, London had the highest 
tuberculosis mortality and the county boroughs next; 
in 1938, these positions were reversed. 

The curve for females has much the same shape in 
both years, and the degree of urbanization appears to 
influence its shape and level very little. In contrast, 
the male curve is quite different in rural and urban 
areas. In both years the male and female patterns are 
most alike under rural conditions. They are least 
alike in London and the county boroughs, and the 
difference is greatest in the 50-65 year age group, 
that is, in the latter half of the working life. 
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Fic. 2.—Phthisis mortality (1911), respiratory tuberculosis 
mortality (1938), and urbanization. 
Sources: 74th Annual Report of Registrar-General for 1911, 
Table LIV, p.Ixxi. Registrar-General’s Statistical Review for 
1938 and 1939, Text. Table XLVII, p. 72. 
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SEX DIFFERENCES IN TUBERCULOSIS MORTALITY 


TABLE I 
RESPIRATORY TUBERCULOSIS MORTALITY AND SOCIAL CLASS, 1930-2 





























(Rates calculated for 100,000 living for the 3-yr period 1930-2) 
Age .. 16- 20- 25- 35- 45- 55- 65- 70- 75 
- Total 199 319 325 383 | 454 377 269 177 87 
Social Class I 196 165 188 261 | 242 264 211 142 142 
‘ ‘i II 148 212 231 283 | 312 258 211 165 80 
Males so = ‘a » = 182 313 330 375 386 273 272 179 80 
. a 166 312 323 416 497 377 249 176 105 
ie ~ Vv 228 359 363 488 605 518 398 270 138 
Unoccupied 337 582 686 690 359 178 102 = 31 
Total 306 315 262 199 145 125 103 82 64 
Social Class I — -- 124 91 84 85 — — — 
je a Il — 199 173 131 99 94 84 132 68 
Married ae oo a 295 309 257 196 145 127 110 62 67 
io = 260 320 273 207 163 126 97 81 78 
9 - \ 396 379 342 271 197 160 138 95 _ 
Unoccupied = — 105 70 26 17 _- — - 
Females —©= ———— —  ———_ —_—_ — a SO —————— ——__— — 
Total 308 379 371 250 170 132 125 94 68 
Social Class_ I — — 199 454 — — — — 
* a ll 188 202 204 124 91 71 115 — — 
Single - < a 223 321 347 245 176 168 152 106 129 
. IV 329 453 445 286 193 130 _— — — 
o a V 319 444 346 299 273 210 — — - 
Unoccupied 549 596 493 302 188 114 109 84 43 





Rates not calculated for any age group in which there were less than ten deaths. 
Source: Registrar-General’s Decennial Supplement for 1931, Part IIA Occupational Mortality, Tables 4A, 4B, 4C, pp. 215-325. 





The figures available throw no light on the effect 
of urbanization on the mortality experience of young 
adults. In studying the difference between the sexes, 
one is liable to overlook the great similarity of the 
steep rise in mortality which takes place between 
15 and 25 years of age. 

Urbanization seems closely associated with the 
general level and pattern of male mortality but has very 
little influence on either the level or pattern of female 
mortality. Since urbanization itself would seem to be 
an environmental factor of equal importance to both 
men and women the cause of the difference cannot be 
urbanization but something associated with it. 
Dahlberg (1949) has recently suggested that urban 
employment is the responsible factor, and it is interest- 
ing to note that John Snow (1855) reached the same 
conclusion. 


INFLUENCE OF SociAL CLAss.—Fig. 3, based on 
Table I, confirms the generally accepted view that 
tuberculosis mortality is closely associated with socio- 
economic conditions. The male and female curves are 
similar in shape in the five social classes, but the level 
at which they are set rises as we descend the social 
scale. 

Single women have consistently higher rates than 
married women, particularly in the younger age groups. 
The curves for single and married women are never- 
theless very similar. Selection can be held responsible 
for the higher rate for single women, who, if tubercu- 
lous, are less likely to marry. The great similarity 


between the mortality experience of married and single 
women does suggest that maternity is not the main 
cause of the high female mortality in young adult life. 
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Fic. 3—Respiratory tuberculosis mortality and social class 
(1930-—32)—based on Table I. 
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TABLE II 
RESPIRATORY TUBERCULOSIS MORTALITY AND OCCUPATION, 1930-2 
(Rates calculated for 100,000 living for the 3-yr period (1930-2) 








Occupational Group Marital ———_—_- ———— 
Status 
16 20 
Farmers and their relatives M 69 90 
Ww — 160 
Agricultural and garden labourers. . : M 72 173 
Ww 214 
Coal hewers and getters a se Fe M 233 4 
Ww 706 531 
Iron and steel foundry furnacemen os M 310 582 
W : : 
Employers and managers. ae ke M _ 306 
WwW 
Boot and shoe factory operatives .. gle M 400 746 
W - 
Carpenters... 7" ee - M 153 209 
Ww 255 
Water transport—dock labourers .. ote M 638 914 
W — 371 
Retail proprietors: Grocery M — - 
WwW - _ 
Dairy, meat, fish, greens M — 349 
WwW --- 310 
Commercial travellers sa “ M — 162 
WwW — _ 
Bank and insurance officials, clergy, physi- M - 
cians, engineers... - oe = W . : 
Typists and other clerks (not Civil Service) M 181 364 
W 
General labourers... a ofa M 278 619 
WwW 496 440 
Textile workers: Cardroom, etc. S 313 291 
Spinners, etc. S 470 685 
Weavers S 327 299 
Dressmakers, glovemakers : S 256 492 
Milliners, hatformers, sewers, etc. .. S 318 542 
Midwives, sick nurses, etc. S 172 233 
Teachers (not music) S — 127 
Indoor domestic servants S 171 277 
Charwomen, office cleaners. Ss — 306 


Age Group 


— — — —_—- ———_- — Social 
Class 
25 35- 45 55 65- 70 75 
140 148 133 134 101 110 Il 
172 117 105 76 89 179 il 
186 208 197 183 135 118 104 IV, V 
212 162 135 114 87 : IV, V 
264 261 422 384 291 153 iil 
396 238 147 159 129 lil 
397 454 763 430 . IV 
391 293 IV 
257 294 349 385 354 280 234 Il 
not available Il 
697 692 724 570 643 _ il 
400 378 168 190 - Ill 
231 306 346 334 276 153 - Il 
234 202 118 118 159 Ill 
416 722 959 813 696 613 Vv 
361 304 168 136 V 
246 316 255 231 141 ll 
190 137 114 Il 
327 366 422 281 281 il 
220 183 109 88 Il 
294 373 449 363 214 Il 
181 125 90 Il 
142 203 188 166 212 162 113 l 
90 77 86 88 
400 482 550 425 336 213 - Il 
not available 
414 538 656 549 449 307 165 Vv 
381 318 222 180 138 83 V 
244 . - Ili, IV 
548 636 - ill 
280 226 107 lil 
457 298 158 217 - iil 
488 420 606 912 IV 
378 214 124 102 lil 
198 130 86 65 Il 
316 243 178 159 146 125 130 Ill 
176 190 237 270 - Vv 





Rates not calculated for any age group in which there were less than ten deaths. 


M men WwW wives 


S single women 


Source: Registrar-General’s Decennial Supplement for 1931, Part IIA Occupational Mortality, Tables 4A, 4B, 4C, pp. 215-325. 


It is assumed, of course, that the frequency of 
pregnancy is lower among single than married women 
and that the risk of tuberculosis acompanying 
pregnancy is the same for both. 

Individuals are assigned to their social class by virtue 
of the nature of their occupation, or, in married women, 
by the occupation of their husbands. Thus there 
remains a sixth group; the unoccupied. One might 
expect that as tuberculosis is a disabling disease, the 
unoccupied would have a high mortality rate, and 
Fig. 3 shows that this is indeed so. 

Social class and all that goes with it appear to have 
a great influence upon the general level of mortality 
from tuberculosis in both men and women, but cannot 
be held responsible for the differences between the 
sexes. Broadly speaking, men and their wives are 
subject to the same socio-economic environment, but 
the male and female curves are widely different in all 
five social groups. 


INFLUENCE OF OcCCUPATION.—In_ studying the 
occupational mortality figures in Table II, one is struck 


by the very great variety of patterns which they describe. 
With one or two exceptions, however, their essential 
character is constant, the male rate rising and the 


female rate falling with advancing age up to the end of 


the working life. After this point, the male mortality 
in most of the occupational groups improves. 

The exceptions, which are of particular interest, are 
farmers and their relatives, and agricultural and 
garden labourers (Fig. 4, opposite). Only in these 
groups does any great similarity in pattern exist 
between the sexes. There is considerable difference 
in the economic level of these two groups, but they 
both work in the country and, with the possible 
exception of coal hewers and getters, are the only 
groups of rural workers examined. 


Coal miners are interesting for two reasons; the first, 
that their wives have such high death rates under the 
age of 35: and the second, that the male rate shows 
every sign of falling until it rises suddenly in the 
40-45 age group. One wonders whether, but for the 
specific hazard of pneumoconiosis in this trade, the 
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Fic. 4.—Resiratory tuberculosis mortality and occupation 
(1930-32)—based on Table II. Top: farmers and their relatives; 
below: agricultural and garden labourers. 
male rates might not have been similar to those of 

the other rural occupations. 

A mixed group of occupations in Social Class I, shows 
a striking difference between the general level of the 
male and female mortality. The upward gradient 
during the working life of the male is very slight. 
Several occupations in Social Class II are shown and, 
with the possible exception of grocers, their favourable 
socio-economic position does not result in the male 
and female patterns being any more alike. 

The high level of tuberculosis mortality of boot and 
shoe workers has been investigated by Stewart and 
Hughes (1949), who considered that the factors 
probably responsible were selective recruitment and 
working conditions. They observed that tuberculosis 
rates were not related to environmental working con- 
ditions judged by the usual criteria but did appear to 
be related to the number of workers per room, large 
groups faring worse than small groups. 

The wives of general labourers share with the wives 
of miners a high mortality rate in the 16-20 age group. 
No evidence is available to account for this, although 
one may guess that the woman who is married before 
the age of 20 is subject to considerable stress, particu- 
larly if her husband has a more than average liability 
to tuberculosis, and if her economic position is poor. 

Selection, small numbers, and the lack of variety of 
occupations make it dangerous to deduce anything 
from the rates for single women. In most groups the 
pattern is of the typical female type, but milliners, 
hat formers, sewers and trimmers, and charwomen 
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and office cleaners exhibit some similarities to the male 
type. 

In what way does urban employment differ from 
rural employment and so increase the mortality rate of 
men in comparison with their wives? Socio-economic 
qualities such as income, nutrition, and education, 
may be dismissed because they are not points of 
difference between rural and urban employment. 
Population density is greater and available sunshine 
less in the cities, but these hazards are shared by men 
and women. Terris (1948) suggested that physical over- 
strain might be an important factor, but there seems 
no reason to think that physical strain is greater in 
urban than in rural occupations. Mental stress, on 
the other hand, may be greater in urban conditions of 
work, but there is no obvious connection between 
this factor and tuberculosis. 

We are left with conditions inherent in industrial 
employment to which men are exposed more than their 
wives. Industrial air pollution might in some way 
lower resistance to infection or increase the liability 
of the breakdown of a quiescent lesion, but existing 
evidence does not support this explanation. Working 
conditions in our cities and towns have improved in 
many ways, but the risk of respiratory infection is 
still great. Large groups of working people come 
together day after day in the confined space of the 
workshop and canteen; there is often crowding and the 
amount of sunlight and ventilation may be far from 
ideal. In any large group there may be a case of open 
pulmonary tuberculosis to whom the same people 
are exposed over long periods. There is the opportunity 
for infection and superinfection, the risk increasing 
with duration of exposure. Men are subjected to 
these conditions throughout their working lives, and 
women, for the most part, only until they are married. 


CONCLUSIONS 

(1) The pattern of female mortality remains fairly 
constant under varied conditions of socio-economic 
environment and urbanization. Employment of a few 
types appears to modify this pattern. Socio-economic 
factors appear to determine the general level. 

(2) The pattern of male mortality is influenced by 
the degree of urbanization and the nature of occupation, 
but not by social class. The general level of the curve 
is influenced by all three factors. 

(3) It is suggested that the increasingly unfavourable 
mortality experience of occupied males with advancing 
years, in comparison with their wives, is due in part to 
certain qualities of urban employment. Frequent close 
contact with many people over long periods seems to 
be the most likely factor and presumably acts by in- 
creasing the risk of exposure to infection and 
reinfection. 
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A NOTE ON THE SEX RATIO IN ANENCEPHALUS 


BRIAN MACMAHON* and THOMAS McKEOWN 


From the Department of Social Medicine, University of Birmingham 


While exploring the association of the sex ratio 
of stillbirths with cause and duration of gestation, 
McKeown and Lowe (1951) noted that for anence- 
phalus sex ratio increased with duration of gestation. 
No explanation was offered for this observation, 
which is here considered briefly. 

In Birmingham, during the years 1940-51, 454 
stillbirths and 28 infant deaths were attributed to 
anencephalus (440 cases) or to anencephalus and 
spina bifida (42 cases). Duration of gestation was 
recorded in 475 (of 482 cases), for which the associa- 
tion of sex ratio with duration of gestation was as 
follows :— 





Duration of Gestation 


(weeks) Under 33 33-37 38 and Over Total 
Sex Ratio (per cent. 
male) .. ne 23-7 (139) 27-3 (194) 42-3 (142) 30-7 (475) 





In Table I the distribution of anencephalics by 
duration of gestation is shown to be associated with 
birth rank, the first born being delivered earlier 
than the later born. Sex ratios of anencephalics of 
birth ranks “1’’ and “2 and over” are respectively 
26:3 and 35-2 (difference: 8-9 +. 4-2), which 
suggests that the association of duration of gestation 
with birth rank may account for the association of 
sex ratio with duration of gestation. 

TABLE I 


PERCENTAGE DISTRIBUTION OF ANENCEPHALICS BY 





DURATION OF GESTATION AND BIRTH RANK 
Birth Duration of Gestation (weeks) 
Rank - - - 
Under 33 33-37 38 and over Total 
1 38-3 (93) 41-1 (100) 20-6 (50) 100 -0 (243) 
2 22°8 (23) 42-6 (43) 34-6 (35) 100 -0 (101) 
3 19-6 (9) 34°8 (16) 45-6 (21) 100-0 (46) 


4 and over 16-9 (14) 41-0 (34) 42-1 (35) 100-0 (83) 


Total 29 -4 (139) 40 -8 (193) 29 8 (141) 100 -0 (473) 





In two cases (of 475) birth rank was unknown. 
ig 30 +34, n 6, p 0-01 


In Table II it is shown that this explanation is 
unsatisfactory, since sex ratios increase with gestation 
for births of the same rank. 


* In receipt of a grant from the Medical Research Council. 
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TABLE II 


SEX RATIO OF ANENCEPHALUS RELATED TO DURATION 
OF GESTATION AND BIRTH RANK 





Birth Duration of Gestation (weeks) 
Rank - - - 
Under 33 33-37 38 and over Total 
I 22 -6 (93) 26 -0 (100) 34 -0 (50) 26 -3 (243) 
2 26-1 (23) 30-2 (43) 51 -4 (35) 36 -6 (101) 
3 and over 26-1 (23) 26-0 (50) 44 -6 (56) 34-1 (129) 





In seeking for a further explanation of the observa- 
tions we have considered the association of the 
malformation with hydramnios, a common compli- 
cation which results in early onset of labour 
(Table II]). Birth records were traced for 262 of the 
319 cases born in hospital; hydramnios was noted 
in 182 cases, but there was no information about 
the degree. 

TABLE III 


DISTRIBUTION BY DURATION OF GESTATION OI 
ANENCEPHALUS WITH AND WITHOUT HYDRAMNIOS 





Duration of Gestation (weeks) 
Hydramnios - 


38 and over Total 


Under 33 33-37 
Present 32 -4 (59) 51-1 (93) 16-5 (30) 100 -O0 (182) 
Absent 27 5 (22) 27°5 (22) 45 -0 (36) 100-0 (80) 
Total 30-9 (81) 43 -9 (115) 25 -2 (66) 100 -0 (262) 





x? = 25 -43,n = 2, p <0 01. 

Table IV shows that the increase in sex ratio 
with duration of gestation is confined to cases in 
which hydramnios is present; in cases without 
hydramnios the sex ratio appears to decrease as 
gestation increases. The fact that the sex ratios 
of cases with and without hydramnios (29-7 and 
32-5 respectively) are approximately the same sug- 
gests that the association between sex ratio and 

TABLE IV 


ANENCEPHALUS WITH 
HYDRAMNIOS 


SEX RATIO OF AND WITHOUT 





Duration of Gestation (weeks) 


Hydramnios : 
Total 


Under 33 33-37 38 and over 
Present (a) 22 -0 (59) 26 -9 (93) §3 -3 (30) 29 -7 (182) 
Absent (b) 45 -5 (22) 27 +3 (22) 27 -8 (36) 32-5 (80) 
Difference 
(a) — (bd) 23 -5+.11°3 0-4410°5 25 -5+12°1 2:8+6:-2 
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length of gestation is not explained by a higher 
incidence of hydramnios in females than in males. 
If hydramnios is directly responsible for the earlier 
onset of labour in females it presumably appears 
earlier or is of greater severity than in males. 

No explanation is offered for (a) early onset of 
labour in first births, which is observed in the 
absence of hydramnios; (4) the lower sex ratio of 
first than of later births. These two observations 
appear to be independent of one another. 


SUMMARY 


(1) Records of 475 cases of anencephalus (454 
stillbirths; 28 infant deaths) are used to investigate 
the increase previously reported in sex ratio with 
duration of gestation. 


(2) It is noted that first born anencephalics are 
delivered earlier than later born, and that sex 
ratio is lower for first than for later births. These 
observations do not, however, account for the 
association of sex ratio with duration of gestation 
which remains when examination is confined to 
anencephalics of the same birth rank. 


(3) It is shown that the earlier delivery of female 
than of male anencephalics is associated with the 
presence of hydramnios. Females with hydramnios 
are delivered earlier than affected males, although the 
incidence of this complication is about the same in 
the two sexes. 
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